From 4bb9edd2ba6d00e8c432ce302b71bb7452f34a6a Mon Sep 17 00:00:00 2001 From: Kathy Date: Thu, 1 Sep 2022 13:34:36 -0400 Subject: [PATCH 01/17] update readme, minor --- README.md | 20 ++++++++++++++------ images/logo.png | Bin 0 -> 59004 bytes 2 files changed, 14 insertions(+), 6 deletions(-) create mode 100644 images/logo.png diff --git a/README.md b/README.md index 5d87880..665710c 100755 --- a/README.md +++ b/README.md @@ -1,8 +1,14 @@ +![logo](images/logo.png) + +-- + # Sei framework Welcome to the Sei framework repository! Sei is a framework for systematically predicting sequence regulatory activities and applying sequence information to human genetics data. Sei provides a global map from any sequence to regulatory activities, as represented by 40 sequence classes, and each sequence class integrates predictions for 21,907 chromatin profiles (transcription factor, histone marks, and chromatin accessibility profiles across a wide range of cell types). -This repository can be used to run the Sei model and get the Sei chromatin profile and sequence class predictions for an input VCF file. +Sei is now published, you can read the manuscript [here](https://doi.org/10.1038/s41588-022-01102-2). + +This repository can be used to run the Sei model and get the Sei chromatin profile and sequence class predictions for input sequences or variants. We also provide information and instructions for [how to train the Sei deep learning sequence model](#training). @@ -54,9 +60,7 @@ The following files and directories will be outputted: The two `*.tsv` files are the final formatted outputs, while the `chromatin-profiles-hdf5` directory contains the intermediate HDF5 and row label files outputted from Selene from running the Sei deep learning model. -You can use the HDF5 files directly if desired, but please keep in mind that the variants will not be ordered in the same way as the TSV files. (Please see the corresponding `*_row_labels.txt` file, for the variant labels.) - - +You can use the HDF5 files directly if desired, but please keep in mind that the variants will not be ordered in the same way as the TSV files. (Please see the corresponding `*_row_labels.txt` files in `chromatin-profiles-hdf5`, for the variant labels.) ### Sequence classes @@ -111,14 +115,18 @@ Sequence classes are defined based on 30 million sequences tiling the genome and The configuration file and script for running train is under the `train` directory. To run Sei deep learning sequence model training, you will need GPU computing capability (we run training on 4x Tesla V100 GPUs connected with NVLink). -The training data is available [here](https://doi.org/10.5281/zenodo.4907037) should be downloaded and extracted into the `train` directory. **NOTE**: because the Sei training data contains processed files from the Cistrome Project, please first agree to the Cistrome Project [terms of usage](http://cistrome.org/db/#/bdown) before downloading the data: +The training data is available [here](https://doi.org/10.5281/zenodo.4907037) should be downloaded and extracted into the `train` directory. + +**NOTE**: because the Sei training data contains processed files from the Cistrome Project, please first agree to the Cistrome Project [terms of usage](http://cistrome.org/db/#/bdown) before downloading the data: ``` cd ./train sh ./download_data.sh # in the train directory ``` -The Sei training configuration YAML file is provided as the `train/train.yml` file. You can read more about the Selene command-line interface and configuration file formatting [here](https://selene.flatironinstitute.org/master/overview/cli.html#). You must use Selene version >0.5.0 to train this model ([release notes](https://github.com/FunctionLab/selene/blob/master/RELEASE_NOTES.md)). +The Sei training configuration YAML file is provided as the `train/train.yml` file. You can read more about the Selene command-line interface and configuration file formatting [here](https://selene.flatironinstitute.org/master/overview/cli.html#). + +You must use Selene version >0.5.0 to train this model ([release notes](https://github.com/FunctionLab/selene/blob/master/RELEASE_NOTES.md)). We also provide an example SLURM script `train.sh` for submitting a training job to a cluster. diff --git a/images/logo.png b/images/logo.png new file mode 100644 index 0000000000000000000000000000000000000000..4b21ccb513b8be43984b307ea3a16737700ac895 GIT binary patch literal 59004 zcmaHSc|4Tu_y3JTB|?!9DUm(0Cn{Tnh-{60&AyH7q_S^i-?Jq9zKwks*<;2sRCZ&@ zzVo|=r=I8Y{o~i`<#}GOnYr)lI_JF4`@GM&?%VH~ywt_>*Um!_bn%I_xDo`NML`gr z2O&QAr0YGx0Q~QqwY0h&1aaKN{^7h%7llI*J@iEUq4MkaMYM-C&9>9*Zbk+*51tN0 zL@0fRvEy3XGvX(g2tI@nVw7BS9Cm}_YO+;eKQ5%#=r!2dN?EwaSI3PMaXQ%kT3e%O z8<%BP{UJ9Tm@^g9&ZPZS6O*$n|6u)>v^7}F zp%R}><#@_4D$*G8Fcs1pIs_WNp0uam&2bMsw^0-)_G=IKoysO5&%WLO&Z;NS?KUK09j z@Gh5~QcPj(=+)?i3idmM^%wsx|28w&E9RpNaxZb-Y1E>{2^#xrkhCX7pw}?*q^>95 z-{py3{k$5zRAC{&1pWOi&}8#!G;MDKaml$qOZMKdpusC9$Au@${auMK8>|#2)8;JP zT5sMqu+E1g+ZHI_1~0_?|NC&n%u+%7{UHaZpq2`rBS~3_NUUzCL{sk+TZ)* z; z;GHg3CA|YoyHtQ@@i%Pdslca_;xg}LP2Za#pybiNKc-ewXmQF6^o$`5OZjOV`+3$v z2<63h^G`6uL6j0hF%8Zmm7gy_&VNJUtkKa1jK}W#!k>Nj+93>yz7by~!Lx_{?9W*; zD#&XXxU(1MkB8=*VPG>Sh~R~t`OoDM&N@Wj;B#68LT~>-5!M1g;@^IQG=?m!diYhK zwHzh1asH22?a!Wokru<)lpZZfAl!T7&l=LX3s2|c6 zs~goduMmXEQ3g*bPlEuNOZ1Fp$RR_y1ATCW0{t)S>5F!FXr=iWAotZGzg9H{Oe?Cp_zh}U{6{u z&%dk8f|bvnEb})p{65bvsgSDk!lXW^7!TsjasBaZnwJ^E^NbzV+bn-S!?8n^gPTBe zDu3OE2fT5v_4U>IOMiZKD>1QLkoQ8?|lk5l(a9v!~c6bIqdDu!45su|89jGyOm$q?K%AY z?4pPjXMnFUBKWP&eIlK|uIzs=xOnBvqNncPA5vp4{`tQbpM3UX-pOUqo>VOVPezDk zDSJ(!)n9$}P4SXga@0{2_Jn4DQR*q698u$SEy9N_j<9%NP@vXnfp(NEa znz&)vUih=VGnNi@Xzf^u($KY-VW^h6kyVfB34xl{Q;>;b2sHrEDP%!$Esm+UD0+=^y@$3LJz(s2ya7 zC+2@YGf)8r%rihQXQMeFC{+Gq$b;u(VS=Pidl5Y}Gx#K6r=Cx+7h?&|lNuNgnG6;A zKJ{zAIaqK~Z327Ox3V@`&DXSc9M34d16K5gUxZr0K~!v*2~Ng_z*t>;IGA&nPCjM(!5a;k0KFv zC*$d}O9P+HOn5*bAWJN|E|HRA=DrGoKgp^d)$-SDoD4opbZ#^4Q>i^un$NCbZ~VOz z_>TJSb$n@hfN?Q7E%9J=DTV7r!HGHiS0`9>@^x`ouy>aS+0rWsjD^Oq+5T+u#qs@H z5y>^mZDxESzQciMzhLhslo{973@Xg6OCSzn;^G_RB$d?e?EdX<95$wY!Vtrlng;#T z8l^kTm2jCHEEqQU9C;n>aM17;%g2YW5(ljpa14m@QgK_GPD8P??` zvi#k;jv)Va%&;yGf@A#K1KM@LqG!RM-sau%uV#hN74S>3P8H(_$H8h_0LRZ}|14g` z$1a3^^5oQ2uNn;1J+aa%c@V_0QcIi|hB)ZnpH#cD^}1^W_j`;; z;)y~KN0>b&0wLGIuHSVH;q+W%iPhyOhwd$7ZfUU(W=3Mlogx`4XH0NFK-s52_SSD< zJru`#xxh!U`ZfdEdqvZl#U4q;mt#x3RncT&`fOrn!3Di+pp5p5But>!R&eYYvEDMe zh^{tGPj&Pd5?a}92%IJO=(1?UDPw5x?UP76mafFCur9*|p*V-~I@$i1nCR-W$NW}` zG%oxi6fFeE&ZC!CTP9uDW}i9WENE5z!uoL|(K5P8zJmA`4T!f6t7Q#=WM3Cfm9Ov6 z$m*dEVsUo>tdkVl(SG&RmM?LM<*)(X5>ITd3!iwY#bQb)eIxQlm9I2S_UdMlDCL=- z4I!U85h-Z3SdC9+0mh@?^Jxr$1`XcLDq`RdNh`_%bq(@ zy427vQr}6xVAzwi>0MCb!}#T&-(!hZPVY!46c-!nrHak#SKn_s7EBzBd@0e|GEQZ- zEF(`i-9P-pkl^B(MX~eP8d5e1UwEW}sh%a^Nu+za{3-6AAC-YBkW6)2^mQCh?)e?f z{&TnC6)VS8iPZ+I3nf=yf6UN1O$O>YCl?iQjkUydZ(uU?g#;TaSzzk@yVm4crEpi# zXbzX7jq8ZvQyA-{mIV3f-X{tr!`?gLyW4Zhwyb%<#nE^xR{*!O4__C2eg2OVg8(bx zV~O_}9TsNt3&g0>A;oOYyi)iZuGp8H`#dt4MikLSCocDt-orHm@+%x$SQeXex9gdM zHAfn1!<4erw%(|V63WUYrj$VrDenxi*pfa6VEiGm*SS=P|N1{JNl$v+p1Y6YvoEj` zsv|$ae^+ZE%RL+a6y---kJ&jcf(ylj(hP}RG#(H_^jHhfgAR6-Z5?KgHy9XX<{u5O z8z0iM@m8MM>;*K)c529+AVAH6cs+3%A=W$me&Y{vRLVdCOu3EKxct?S-jZzG1+8Eg&K`iz&2Pr}g4 zLneCrwglZ72&*OPmlrOeQo9Kjsq@^dfm7cr-XCx7G=6&m1F?K8=r6Q3L3coZ0HGB1 z1LV_l0gP+Qy7%3=U|R?C;P$7%+Q0|ss>15VwXFqV zd1o^dbDJ|TMcm;HbGpAFXY|E4zjeGh4NtAZW|MhSHXMZ~O$zK26H#*cnDGw?T#9XS z&R?uO%~7#!8(>_}_u3U)XROTEQ3Xg`K=fUBW8hF=l@`tsF5-br)W3I@0NRt~D^W34 zk(gjH6?mtUpWydiDtFRzYW+g*T6C|HI6GhQF{K6+y#xp$Y{czsA5Z_671hESs=9sQ) zTnU2-vy=>FF6ia^>Hmq*Ev{3Xgkr7Z19~o0344L$qVqsTxXcprTIqjc)V@P`YQX$t zEQbzOr9!b-sF=L18pW~k$J@G=u3Wfyw)o+t>*hZU8Ba@i=rzC@>A_a>@HdI&^Ew1d z*xK0Z3$xnm7h*8t$t@*++c)Jr92cao@@3ty^z0X(Q;4O#0c8C|@*)%q^5b*~-K!a| zzRp{Q_rEKe?b&;%_UebB9Yaz531!o-0w@r!BfXTd^!{B=bTyQTAzPR>wPeAcc&n?C ziv^~+J(I$m=h(gXR#~$3TWK*6R}kc4J7SOXAuyCq>?1aKK_i7#41_6p3px|hf3oUH zZt3&cPGVKcWNvQkJ{3_dhLG4)EgQ6RMKF4SwrV>vc|A7>nlA3Y;+hLV(J=#!;=ErW!>}U2w?v=pAtC#i4{?vU%-OXXuAVW2{9J)R8{r` zZDt89w@gch5J=HD1)OvlA(5oOM}E(Bw}&ck{4S#iKL1Ez}M{6m~}g;OZK$r#eV?YOA4ke6k+l;A)q?- zGGbBEHR9BlY1B6_|4S_>KkBfOD-a%6-hG0@WVwFkP3HH*jyUSFTEvj6tsWDa^s)G z-m(_9rYW1{N;K>0a$`7$^L1DxrS%Rx48^U)MFjiu7KM!Isn| ztp|vlSqu0J92mpR2!$DnoOcq4pAZxTYD%xg-++xw_uk256GJZvVACVWX4KUk43aoL z^o58n`n7M1PZg9Fyl&P|OQR-+p5}couXZGgnW^EYkY9yE{Z1 zPD(o5+rq)16yDJuO{K4b1N9q>E(b9C&+A!v{TpOQfzysa?1J)aYI$zw%brnCabRgf z4IF!`J_=tq8Ht@pD^vwpn9jJRs_^Exiq)O$~u=+W;Bwlx7Y zMj(Fs%57P=P?{fzt_J3dx5EZeZs{b!&+j0y6=f`tng2lN31G@m`&Xa>!;I_*s@l6? ziwqI`^HLyC<7s%1WipTuv{Xv=FKQ5abqe!v>t>>trC8YE! zn;0IJTZJ;aEj|@pTA=pIi5K!L8R6Cm`4#P1^Y~x1%v?HMILJoR^#K+rC5H92vXoWQ z4td!CC|_tLa3G~N^g;tOgU(6`SD*}XsHT$HzkI)p>NLPYfSlCDHFgtQEq+!)x0>O! ziwwe!g-Dtr3JOYKxzJB2x!hvZ?^9{PTuBoscReIV_eG<>YY1!7G@hp1gWUKgc`&GY zS9TGFnbXU^13KZMMuyG-C|PQ$dhVVU##-=;f6C4e;O~p6uU+FFpH|kzZx(}o8`bC{ zI~?iY1R<2f0!EKA)4q`wI88kfAFAm@FFefUrCr1~&0Vjhdm1I;;T&fPKQ(yNPSgH6 z*5Fe8`f)Kzmo8m1-BrSy`cN{HMfqGf@zxpNKUH=lOY@Y8>nX6RBFIpG)JD^q7&=#? zNl+zy2wUWRuB7hSf$YLbF7FKsv;Gf>kFZFe7HI3@7vFSe56ODd;TKVHb6}tD=rIko zc+)HvNNfSrciVF3`-CYk>|K)Ig)6lbQGUaJ_}V%k!|cK{5{DBFNuN3PHXct5cW|JA zqyIIC(DcQ3z3w9u>C;kOg`nVXI!8jb|* zwkDlq*PM+@Ow)2%79sTYxnq588wWhBgL(sm`PWZuD;a982{&pkObHM<`(KE}B_@6V zE!$TyQ!Epi=6yoqN>@mNNN)|bXz>`cvt8Hc5QgmO4x1lq;xB5LAyCh6`Tz0;l9Vs- z*Q%Vtm3OC$KLnX6(QLLh9q5aLP-zCVj;+|oz-C-5&dkPW3fj&rntm>YAJE5ZoOC@w9@?i; zqNaJN*9=ON<^k!Hi#f@OyJosJc8)X;7A-rJhT8o`!1>ap)Rqk1^cp-Wgd>mdz?e@a59x9Wr$Z0)H;aKn zNGWUHvqtVuQeZE_&iLj6|8fUE{=$hs5G%JGwP*7TACU!s+DD)ySgg9Sj4=*SzBjhP z+s^Diy7`HXXh36DL^OK!Ku5`JNtw>*q;o(7Czh+GA5|`ZX^nGzBCymESUM4}c_CoC zeDxaSIW#F*L?H23jo1FvJvDB$g8c!pnEz6#H!Q6NY?22=12{I!DGv(89hAu6cZH}W zNof~F#Qw@V>OU~B{553r4Yd9>a7x_zY7v4(T0mAYm%{YHJ(LOZ&O#_*1jh2%ZVk1U1%T&b@u9ENmIo57v{YGm z@9+HWu{J+d`_jaKapC@nRPhYAb#zK3h9^svWC7E$`O8S^1attK!wv%S+#9ogO!XFP z_tXCtmn@PV>GU`^y$mFFq;#%iG8$;-D%vbOPhIv4sEb&13H<;7elgFgOB=Y~mQva#bXlfcCCC<%iq#?O`nvN$?wtN>#|Sd6v* z!5EF>B&W>$lBHOm{=c%H{l3UOFxQMiR(fi%6uO|DH+|%Lz4yb^Up$#CX)`%GT zYSR>yj%`v_`u~F)X~P5RlN1T1Smp~MFTfK?8w8m(15lTx%~KB2gJ=kXDwf|<%jT-v z`^Wu@9FEf!oz3}u&`SoS(6?7k^_CR@H8WTpB?|Jmo(-_ia_dB@fKlI_vb4YK@#>#` zF4P3*&xkl4uKYa#&+C1`#BkDLEby=;gW3+_!k{_eLa4$xfftj2w>XGW&z*|-4?XB< zKsb5qI4+OCHpOqJkrnksbDV*kQB^oDKrsfRP5*@o3n$p8#;d8F{^mlf+O3Aex^>lb z?0;M*$-!{ghX1hpz#+L!z~cV)9n#9WpxJy?E%9YwxGWG>NqjIOXoFsrWr79w+UeA* ztfvrdUZ#-N-*vRXnj~0_&XG;d2%^zuk(rL8bIQCIE4|s{x>Shw?&VSWgN^a9V`p2} zCF3oUE42p)m8|<7v-z(sd_gZHCB7sfh|I3*e5Wv>>2e){UY75Uf9F*^1KF|VI_%5W z7VP*cTdsZ7Qx`vaDtcs1hQQz?GpR`_$osYzj;!okQHdDjT}xBBt)DwxG(1q2piUk} zDLxgW`oIn~y5u0nkGd2e;2WOhVY!JaBff0AOE4jN*ZQ^X%Oy^Oi5g1xO(my|X2%{$ zS>qS^e9!t@A@|9}VU4RRu$_$>`xP9>`TeG{>(e}sibFvWE4!rg8=-K5MV=3V99u#6 zNu4xAkGRlQQ16YQzK?aV{TNSLmpJ}GK5@f?ITI(W*J53{bd=7=ovWYoZ?oS#X{!A z>l2$fwyvqN+|=J$Acji995xJcQ@uTD*(U|IFu)OJ>O-syYgmXp{r$d7=8?5lNJzDn z`Nq$r@zyJrf1U*pVcx{YP$$4v-%~_&3r@K2+BW4c6*#u_wewBB;&9#5T3*G0X!S*| z9GCgt&z)VOe2Tp$JO*%l=;Naq;mn5ahQG=ZA)78TJ|SmDf>S$sN>sx@>V zaQtnePGrreC#mZD`>?yWD+Wc?$InA1)Zpy&p?G57I$vs!=kQ zI?2@Fl!HQIij*|$=O-^Ej|ZyUnxu#z3*+(b*`f^g&b?YNG@!PBY+V~yMMAVGdQj}R zfp*g|apP>*GZKA0xv(=^w;pS3ZSXXsu=)zrdk-x4e%T?J-{F=H~obLO3G5#1f-PhiqkbFZSERk#~j4p>}FT!^|L;}h1asRvlER^Gt( zq{w0y>Y6QT0ZyeF*i3dY_(Kw??ZAtAEvg!qeo?vUW@YqNK2UO z(dJrz%wTpxEu6$A+oK4$5Ta)kqAiFLH~6GaIaWoP#vPXsrv{;Jg7cU=q=F0#MaPkS zkBzb)=C>QcmD9=2Sl%s<+s zj(R$TLt1&g)k-k${d6v`q>YK$Hc{!0553Cd3o%i~2fb49%Rh_F- zi#@wZ3_7x$n-j3H&c|SaMTX2py{Mm`!segeUT^H6quI@w*MR7Ab5fZ?eA~a*{_3^Y zPU5jdT;I6eR;ayVwk1-4L3uFs7SE#AoerGT(l8o^kl+9}_ z6wb8mT2`x4Ealijjy$|x~%kkv)Sl)_=0NM27(}gz+ zbk#Nv2kR3~T`qv{yX@JeZ1p#T<0X04ItVARidxqSR8r5noOfMAf z|DL^#!SyUD3{GV8YsbYA{=o3@YxVFfDoJ>~?vS4|w&DozdV*xhrTA^nq?oHL9N)$$ zCLE0m<6CvyQUlB$+N|!{4O<<~Ed_iwdze*uN+6I@gF(I4hj9Bsk_{4?#hz^iOZWT1 zg9EHg!3VW|u@HR_aO*n58lWys8mjB@cVQ&-40`G>ZZRCs{t!Dd$VLo1p6lj+@cz0( za%iW=0bV&qyTWXY#b7CKe(a*&^!@#{=|awUTPiB=GDVFmohv!xxuu9Nc-k0x*U_S> zcxG4`cg#wFuUh10_ekCT9Eltfh%Q7DHX%h|i#MBEsM^MzsgjiTBGg*yoTdv2VmJvQ zhXTN$ zuwYi@gDEiO3s zl{Pn(_VgO%w@80WWxx!eL?3w(xH~f}MgVZpzv%9dVb#~ZKdq%+d(GaW?ST-}v%dHt zp@~IIsS*|Kj{21yjk8`uURLu0MdGsbDJnN-!Nys^mJM?QB6fWidaV2nYZRRlxvoQ= zTA^<}Yi^xe)}y0g4qvO>qg=g{?r}AHM^UmBsvmk4Am6qf^lb$GSqE`FZ78^*vdp5d zY|_v7%aK>{?F$ksLaxZ2A8kgKACoE!f5*;iJjLeNUriO8cXs}Y882z(>nz&Y7g~b( zsn3*uu+pB1Jdc+xq-WGqb%w3*{#OGnO47odZ<>1Ql;Yw!PiI5}+JzYykVkWMv!lHb zloq}U;5iIiZ28&u9*;iPkTEz_A^*V^y{rSIa{;TpMEHd6PYP(Rl#7KF=T!UpQn0>= z%<=b~3tTe&x^re$+WSOvM6Fbc(5uz;#nR2zt@ghQ%Xd4zf8kkQcnv(C|5Zli@)Ba^ zqVB=a`Y&v$7i+x5sKfaLw+hAcqLqq0n2-0Sn%Sz}mE+X~KY~Pa2N)Cz2TQj}N}cy9 z8h$|WXzK&*8@sP+RMQ4X32|i(M@4RQ)d49QvyAunkgjeP?zYkp#%yd%7+SviAC(OTDKF*lFaOeb z_R{s}<(8(X^L`3g)>WhVoXWk?ugjZ~=7|^%MK$WX1HWHx!*!ilA!zAi8Wt89Jx(HG zxtXG2Zx@k}p*>fcwj_Npb5;fty!#^Y6yfyq}rW^|Sc|6>N2gB~3(sHf<-DxI?KZrV>smLERJMmsJKqj_24qay@PqI~LQFq%_Ixk+M0+WDOB% zZ=`}mI)$8Vb&uIoFMP&G`#p7d8DLU)-|#E%8@psX!Xe?Cf0P!sz3No;&O1lZS8a*W zvJ$@r+}Rn_saYOP*@IoF@zz}|t`R7o^G(l(Yis@B3EF0>zXbLc7xYQiA_O6kuR_ji zzmGH}MNJXJO=ai1Eh20<0(?zIDJ+B&eSH_`1k8$_C9Bfbnz{b=h!we7YgHoZFiG9N zFep3SM?N7!&7PnRp6jECzLND?$+UDfK@EK?*A6{uh?&m)7C->mzxV~hYG7X&61*7K zX^>N9l^LSHxzXWwvu} z!H{BG8Sp`O_eLn6Bi+@7;~kN53<+AW}eSo1{K+apYS^nGkV=V9_@poN@c-zyIU)pdRDTjc`<({C z!=7&rR#DWmCtTvJ@&Rw_!_@^ny6e|smhIwg$dSR5I|{1u_kUj@N8)?RQy*^TdvHW8 znXORI<-W1&Brc6mr~KcHV`gRI`fk%~4@}SC z>2|71P4RdF=K6aXu!SB6D@xE^h;7T5*K_FQ1-NP3ECNNWAGZ+EFB+(Fzzn3aebujg z5vO}eoe#c$4`A;k%{=fZEz03#|3w~;_XX+oh~s#bWGv6)XP!-V=(Oe?(^YT&ykL(b z;e4mSzSjGQcPD}ZcdU*BG~@M)2?l9zIyEM8*kN@VWs*ltwY|}1 z*1Sf6qkX<3$#ldzOE5d_&Rsk7fDh0C`DUiCX8ch3l7Rz_pwMr;8H8gD}te(hXZLlvi zE*}2X`{S9O)k4-yefgZk*FSpXkht{dGQw_&Jde=I)w_^%0=9=f?y*~`g!g#;1XZAy z>S0cNI*)2=?dy)3B3E9W>-OitN^$m=1%=Mjo`@o&JQv68cdMga0me?Oo+UC8^&I^% z99xY)dJVu^0H|)Wl1Dk;i&;Iar;0ygpwRoELwg0AJ~InBccTg3?}@tiEbWbZ-&E}G z=6at@@Slmrn*Y4UvpThGKk1g|pD|+GTu54mQ8ugG?f^DKw>zl(B{I zbCX$K4GG{u|4EOAm+)kMbHZ*gjUa|ZX3IXsAGD6Sq>vn=YVS7)VP_FRQ5=D;OZ&6x zp0Q+J{xc;*feB5)(U{=eg6b<`WNEi`C{3*>U0OQ?Qd|1=M{JhieHzsQ!4InTP~NV# z7t+3C8x^BCkVo~zKmQll@G`8yE&5k?C@nIqp@5__iG2ng^qM!_Nh|B$v7f*3+tfVh-h2hBNjDzs7P zau=!3+rp0p+w29U3;s_()G#!sC6zrW`EBezGJ%xm3mr(>in^ z`%TUbWP^&U8taK~alYFz7Q}?>$5E!{V7}`Q_-t_4eYB5B2X@+USo7TSH;@$f(r$VF zBPm=jo^Abd(%kTOIfQ_&2ElhLO5Zr%Pj6fu<}|y)0Yj|qz0Z&j@MUGl*yI;&nBWy7 zc>)6v-1}E=bQ^2Xw0R(Ge|hCc=iy2FgDFn>LI^-$DEG0U>An4SQN;6c{t*R~mx!i1 z$pxn48P}t)l#vsBFxmV7<(K}YEHQa_sHak4sg3lVFCEQiBg2jw;@Oe+dp!1o_{aCE zEQmEDWPZ47c&w9K&Sv51Jkr4W0h}3#g4t-VXs?R4oU+U@*NfHEz`%$6qnF}W_*Q7Z z>c!<3jop{Vg;QLJiHreEmdTns)P?K}7XvcJfaJi#<-r3{9Ra}SJyd7n=L-*#z(||X z9yW6E*2uxUv3qr|lV|?KsC!$>xHh$8LM)eF+y*X#!{J%|u@Y`hv`>HNGw8Lcqp%?y@ z%k%sIXIDFnhxi}Bch@2YSXou0wyiZJzd|tjQh;l4^tc~i+xWZpx4GSZ@3#XOWkDsRt1V3Ws+#r z;ZVqUtqCI1GF@Vp6^()TdK{vdOO&sIg7@iFfE>T*^|lT>^!kL^fFVpboGA2sCNVgH z9dCJX{~kAbR$!8ymHFp?S=^4W8wa}Wc91x_jxDWH1;J_O^rjE?k2i);bPqg&!|qyK z%6cth+QLumMGF>j;^W{$FYMaQa=3C5Ho_g_cz}5YmV_wTA|H+|x0b?vHd*IoH0)pE z7Bd4CX~aTYcIhs7dVKl6>6_prU2-xHlI=+d(sH&73T~kI(EN%BsFCS~_^ze=%E!8?_bZ4R~NJr0m`(Vt;biCm0Xhb*^b{jj!pPGp8 z5#4QE8~toHj}P8O$ZdJK-HkfAYhu%QpdqQ#L-Ch~T?+Qxa|F)@!O6{&*tviD?w6T^ zn`wSSzglcIh6mfOAIJ`rZe*S2epQeK4n(hFWJTS5FjC_vzD;wr8${1U@{c;$-L+GV z%*TKGLQYA8^3j#9Sn$>woO%|4Vg4>e7>%hV7-i=}=XBYjs8y&LCl{>T8x7`2Hp24= zQ4+BjJvu6mZh>;LWS8G}x= zPv!PmA}2?Xf-JNk1M>&q0dnR2p9B|!)}~q|GM{9V%v}Rd5JaUloQ`jnzA)y7{}Fv~ zK3iAQ3{rM#Iy@*jmib@p5)H!EX8zCa^CA-i{hfvQT zd~_uV+rIz z-Xe2osU+2U2|PQY)15!DpXjSYTVvi8vfguWIC|4#=b%zHFmU)9cm-~IVW@IBIKjq} z_Ny_sTwsR{dfkJnjb-roNG0$C&b9ABI4+9f*N5k$10!rC15&pgzN9G+Xe5P9?${;J zQ1P4^R@w$e(mrTWhwQxwlCkBnyB}6M@_3UK|DI{IwMSF8Xvm4PjF+yS6Wf0FFm%vn zYObE7UkK@DE+C3pQF8zoUCb?iakQM^m{)^dh$YewCwk_ScLAJjf)nI@34_sm25yg< zyBYzM**hVsU_v{k)%_S7J!iKmH~piVb}REtY6O;W8+PWSuM@TN_jem@%g?gPT~NN4 zQ93hzea;39^=RJ(pLflDBgW*E=aLG@OH)(RHF;tUzI&~1CO z`dUDr&vak2j3V0_JMizMM{*oDgeh2TjsZobwAq%6o&JJ{QJfHMlW%8&s zK-=jE=p!iT<>a-i;h10R`B72bIa<81fIEPEOx~|`I9+LrBtEDZutUeGnXjc2oKU*x}2_TxNeUME2F!(6B^Jos_ij7HdQs4e7A1e#{k# zO0T*P4L_rVmB7>LbJ3&kZP%#nR{Y#SGgcObE2jZ;h_=V-qvdRm!&b(DQhR_aNF$Tn z4!vE3vM=UH0Ngjw$V{~nng@<9%a-+4z~Gx`gT#L6;slr|s>Lkyuzy;t^>}u3(xWeH z9VE%$e01Gu+nTn6f2I`9_l!KRH_et%$-UL`d^ut|`{6I6i{;*bq?Z^$vW12S(v1_u z6ypUvs}oLy&?+ZbWEtsFN^^#4(PxA3D#u1{O7F%ard zLGgZurONS|x8wVk>W@6I3%#=Lzgq`adpu@&S5C(kCD`!wKRQv6!pvSGChtBq%ZQh# z$puFbFE+lD<;=;AI2S`d}q_S|A3%@|VPlv#)@@k|Nc##`|#ujHS>CTDnt}}YrPKY(wvqLNw zo+Y{MuAwO9D`9uJZ!52c_p`in$%8=YeJT(gVU2A0%o4mtmW}yTr4(7(`Er|{P_-wa z>h~iv)@7%RY3Co^<#?e2BBX$bQFUvB9$IXH2YXei>aQ9q9o+DRVpCfX*H+?o-UGxy z^+CkMvxhzIJ8o{Ef`5JjNhc3b=A-vqGRA(S^COk0DNOoyWwwQ&w8ukAyJ3NpDn^C; z^4S8}`Vpj8SXqN3f+ur)#Rn}NNp*=`;@D*w%{{ynxuyLR&!O5xS3Rn=*)XglMfl=5 zIEX&Y#{mw!*Qt&pQS!1z5dhvbRTj)lb(^2IwRU%frKG_5xsLIl2M^pf^Cv0{5RpcJ zrsyrU3=eQLe?@Ze4ax{=w5YQNsjeje6xVj1$ z8!L&zap$i5Y1g|7g}(lI7T~mXmzEO{+ZOUHp=xjT_Wl)>-cY#^5%TG@Z_rDt=~Lk5@spoba%Z)fbpd(+}_B6LdFdLzVKiSV9tamWNH|zb&W?lz#VKc7RMj zkbYe(?o^k>lzD{Co2k2r6qF@4X=!o#ub3@46`px`%S$_yS&RHi#thz2I3OLN&rG;m z!Vfm-))>~Fkdli2&ItD}uRK>mV`2E7QU4;L-LT_j!~Hs@cMNgyI1tw&IF)ca1bTpv zD|~Y3>>NIrq#d`)QcBit9j^Ond1n2Y`9R;zn8~+h8QB~cob&gFG7%(OHE(tvCT-P7 zcT7KPEYSYN9L}s^ru|Ojog%kMw$?dRhWM1PuWvs~U1fk9t42*7ao?xj2CfY@OtyI^D?LQTZ_hR4%gv@l~Rne`e7~Vha7tZI)w;T`dL& z7K79>k#aL7GhXCDyv8^_MS>&P{JkHvEyqbUE+oO2`#ool-Dl$;L@f_cCR5MdmJX$t zPzp5|C^1+3-jc!=Lc{p5XkSh+^ATuhCZ=N8_Mubyo zBFHsuKg1xV!+(DTFK;HFLpZ1}`FAX6Ow<(%$AUN3%t$e!b=gsSzhvI}GT&JBz0GB| z>q8Yk5F{EW$n)iB0*C%yYhI+w@iu>(8Rci0#M%hjQ4DnH))E_^#T3~vUEE?WTy({u z(-EC%^unq1+9wS;V7eT2#`Nft03#k|{;lUB7pchb&XOitPa9UzsR)x-Vd^`)7Qno7 z3oyc`;$qo>LEnm*qzTrC+pp{;l=S^8jHH1KVz+wTYS359Ltf0IG4;7kL7X`@=$WB6 z$#VvIem*V1s`(3e-I(P!wnctC2kIn0^w+P*9`mobXVr3lNLqdMKGw`5%r4lm9}=T4 zsX~?XWl*@}FMRMga#!&{))>hc#4;=+tWdaU!qGO7#IvrS$#J)s{7eQf1}s?>=D*K; zd1>Q!cq;d(-|eH=?7lb1kDydndH&CMo_<#w4u%?L$&U zYm(2Y-QCkco01GSP0M9IG;7pmT9W42#s|O=kcJrXU{+=7EXUT>e4A~?^Xwu~lE;JAD*Z3LsrN8|%D+@*3+THdW& zgQNHu*2`fjhv%tsx(s0!-wW6$3PgG`n=$3ng(4)huWI$?kC%PUPDjwqP-?q6YILxJW^71P6*Xq~KyV5V1NiRDg* z^VH@&j|r;}S0C+dj~_K3SeK0yX?LQu2t1R216|8oTh%mfUV*8-9{*wfk}e4<>2}Jf zrFw~efE#ao^wX!khcAK`xK^K!5ALgC_wp=XGiosJOZ5!p*65uS@!pF|L*yULwl@|U zU%c780we9`CSBXG=($lkyyWD%N?7~*Yc=U{-9$}F4d9(u#dM=H-5hNW+x+`BIFLpI zke!lV6<`B^734nJHCbCD4-qzvcVeEMcd0+ZB}KzKQw}v3vsa}6^_X|dZ62H@Z!(B98ECw6tCd1}PVK77NsV#%@4FECBy+PM^;7FdBlfG&0UBz4)W z)t%t8s7L_b$X5AA--63he0zz0CNd`F=LKF*ImAh3 zISWgK6C!ZlaNZ8sN{ES0P@i3|hl?}z?-_7EehEJxS9r8l-C!l;;Jvh4P^FQTezdXj zIZah^jYrds$UGlUEcDB_zF*(&ef9C+jQvvR_-e<_agt4zaa1W&$)?h@g7!61L_z$f z_`UAB!_8`BQ~lCAL7j=-!%b)R5xu%WTfr>)TVK9zj|eg$hxDw9On!YddcYwjjL!45 z6@KWb9H}g)1;Pp5D)eQp0H7KiH@P*mF#Blu&b_Z)qiYvO*Q*N!C`Q-4p4q*e7pD7` zejLKzR&LLYkEQj?NPoGDK{n-!n{3%J#ir)tx`Z<2sdg)fd{Aa)Cys>VFYYQw<84`) zZ|NUYwA98LMWt#KZSg- z(B7{iHtwNF`77ZGZ%9ze9-7xj*K2v;rDOYDD+(`qU|MP;b8R{@q#6v6EsXpo(xssDFHhkx;SX1t9T=}HcNaAey z+wFpf*!tlp%Es)m=FM76y}KHTAwrB@>_$pJy@d>cop%~8`Pxmq4+5!X6q&Bn_1y}A zSr)qgmdVIkrj@DjT7jsHc!=e+zIPFDOIKv7#?u2&hs|2_LZuMeSt49~^a5gFZ^T#* z6k!p3rP#Z@=}(G&pf7|l3%?^CDiHDLgdcQ`u5+*|x+$9bU`4<(Q_Iv!&7ohV77hzj zp4O-q0ra~r{Mi75!T!<8!-}Tm!U%GnAV+kQogRLM!Trlao`+Nn*H8;>1eO6c4U!#- zw0b<2S+i{9&0MZz!=amREEySOVk(MeOg~_2SK6RAIY=nT{&VeHggqDSU^UT@YVzV` zolYTJm!pqPH;=mP$om~dqvN!f_6uu>p@%an}uS|?6zEr|6p3& zdlsA5U&J>W;n#1rYc7`Sk=4QTVHwjdOT>kY{DbQj`Hh>x&YS;61RD9obQ?*WmYOEj ztfJk#b-qI$XCzNovsTw7@wp$;s&n7mrPe@bAxWG0*71@{(zx;|>z}WqjN@yh-Gc63 zO!%g8OWYA{BlEeX@oE7HS>M2@TYNIMdXs;IF%*%bCjwd~hk)-#&mmHa(q65dcprMh zocUzh#d*Vb0y`|SFtxHaE$fR1+pIZmJtgug+36Y!aT7_Q#}jngx$1d# z6}ywT&^^^l{&K23TKZOkW1=fYZ!FHY!M5)rNGk}IT5u-__3Mw=NO=t>8J=n`#(%Vk z9{jn+?=fE`E}n5--rpBruCvmr)*M~@EZ0!xeZuPNr=X5cJY)RChf39~>GRF^-gE{# z@Vr>*^#7>3%BU*4rhO*(saqf9FYeG<0B4L;zXQ@*G|C$jzw88;b4)h43w4LHH4i#cqd-R+Lkq-Ir5h%=2X z*`P=lhSad)m|!w*7KaiOH|Mnz_jyOI{kICl@f7%!`M=}ShQ;0*)GK$7wRIZNf{vpe;R^IVR>+DUzO=h0UX7(3$NztzX z#2fXwVv9Ia@6jP1g%fZ}mL0il{E+ARB+-`QTJ(9^%`x{OW0etGN6kiC=cGNX42Cv? zzvUB?UaaK4QQYo87V1FD15Bf`1!u#ecEfTUc|^9!^w#!Yk0Bqg_&pifY`?coQuASw zlr)N@p-+LYVvD6*1#RUS{yllN7+kjS_c5(G#FaOonHLgxPOv`IWXC~`&CIbg>W@LY z^QMy+L%$NKu}zj)wM4L5InhDKt-H!gxmT&dex9*G<4KHO&mBl1br!G(Yahz;Uk!+7 zdkYg4a$Nwz^~kcQCZAWZf%}F+a8E}?Uy4V~T9uldgL$RitM~{lJPjElgm(n!g)aYd z8Dg(SqG-!DoCppu9Dx@yH4|#Sag(hWxn2Q(P!<2yeI|>OA)D&=`{=I2T$J3 z0gmh1jhT>4<7T2lD!Nn#t)tV|6Gn~{6{;_DZtC`Y8eFO(`b%_zcr;#YqLVO|IV%Q% zZdSLb-#_+s$r9*{pRXS==MhCnPCf-g13V@lB;^{r+okJ_bp$79aFnUD5yfYxYQ8_x zfk2MF=s+hh9LG}95wZF0r{@xLmR~FrZJ3?et|GBewmECV7;e< zI^z?&8F5QCM@Iqd3sv3NpPBEYddy+Miq z^l3#0EJ#I3%+?VM7#4^AsC+O<4mTMJB#(qJ-YmSwA8yN`(qu$K7sa#ea_98-zZ`$U z1@8LIa>^3-a>q*%^waOqmGlI55)cb8z$z;N(%$R)iT{bT|O zl1vF4d}f4@>D^nKaD4ke0ksIQIqFf*`#q*QKGSmyOG{n*rS~eyQ6M(g(*pY*he}gw~!7&1_kEsYDH$^8&kK1Dxu3%-di`& z`Kg7=s+Q-hgfPlUHPWBC)NhO72dB4~_wqoE>FD}TsM3qr0}|_~u-$p6nDbGynr*$s zye>OrfC$Sr!qb48Xh?gBDt1$@z!Wl8d^~NB`|D{;A?oI=*~-Dl&k2vkF_bNrzB16w zEmfp?Npq|3_Vikzl9~)Me zXS!1QjH(%NS7Yh1X>J?MYHme}FTgLzfF>&SW4h=ZNqMtb%CEeec+z=w8{3mAFFxb` zOT#A4;^S>Qp9QArS^XCX#22_JeHgJBbF-^96R1?orW8du#m`K(6xOYsQB*Y|Bfb6w zQ-n>xdS~>k{#5F?UF@fxE~mdD1UGRD)eM2WfN=ch|5Ejrd-*j5@|~WK4cxHd;8U?uWu}=3g+eKa66jSpXj9&&1kQ>OG zG1e%7_-wl}Bj>uoU#Z!3u#`Dj;$E%XBy31C-Os=9#uNK4+gO`Xjggh>UInJ(G(SHb zp|{7CCbfn0T2|EmMB6PlOSX7M?yn=U2jf*XTk}<+pU}6t<44ATz^-t6d=g=I+7xd2j3S%9}*;kC?2u8)zbKRw%%qnXlIluQ7}P@O*g!u32Agr090yKC7@2-e$By@1(F;C z9IS+-YAwHJgWR6Mq|KZ#)A$QTAr*ZpLtWz6tTOG*pvB0ph#{-r*QX6vh9y)rVF-IK zp*z3^q()<~&|y)J7vdp!2!Q2kIyTu~xMxF?=d}*?ONK=pULl4vyTR2@?!_D%l~azB z3qjpW6>Mz&1^V&gnCwF68>PV3uyJ%_%f^4eZ6S7i-OAY}>s-_Jj+F9|FXby<(GmXf zxRRpg**v0P(kg1*uAb6hB)#!FYH!&8Y9e5n2>=#ng)}gYc-YM;#bU|2fD)2?6&kB5 z5FR)g`-HYGF+B|9o98rRHa;MQ9`bYW+YmS}J*cWzr)aI{fi>`nr)aDcf4dOnG~+qif)OkI*;!36;7wUcKRP4oI*qW*XT~QoT81Ynb}h{#84H zk(&|!UXVhR`-II@e+oF7pwM$ zqfGv4XLOt3qGKdm&_xhQx?Lu^@u!Gi!-f78Ni+*q%zpQ<94lAZHJ`Q=B$&0mZQ244 z%x?9!H_O+#$T8>r`a}ZEt;)}0ezUzk$Yie8yrLeAcp&@z$h3O*cT$OZ#r}Hz#Y(IU zYjvHeFb8PH7i3kBT|z$@-7k-yEl}Px3xan?6;W4xLp2>WeKnUTyC6hA6BopEvyhT- zD6Lpwk6!!G;3fXk7i-*o`CFDn>h~5|N=y==qdL~k3h!FcsQ0XVe>c2RvsR{Sar?6# zy0=tpNF@YZq;WJmRFOB@k8C`e;+^P|XGfo7WxVHsnS%Lj7$ujXG4XoESbF_MWXLoZ zlJ2P-1L5ka{=(g~R==)h?_Ls0Yy%$tk zIa-2V2*d~kh=F_jYuRyKnv@1>9s4{50HcSs_KOd$htht_U2hCp$h();H%Lu?0XY<} z9?A22kIu`d_Z>39qCf@!ao5Bgp_&>_!#$>bVXm7kg{=L_F=4*zP#f2D?c6*w*{z9p z?g;XJypN>jon}%v)6%SXqUq6TdP)BMf4;~JYw2WO1(Z4l3Z zK4zr(^LzN>PAhyKxGwM9w;O3LU~e5y=wUFs5lx(tf|b(vK|ajGIIzB}^I+_$J2m<{ z1@Or(u)az5*7Vk!&q+?Kv#CYNegJA(s7R!PtKz9Ne`>udSV4m9#y)!H>?o~07N?ws z-)4cA5 zJ+-7bt+ml^`#D3`>w3>(o=52!$}v&_dL?CBG>iYjAMx_BR}9YCI+yClNtf%m6VJ#w z##J;29}T$){kWP_ZIY7fQ?J)}nb3Z(y9#|N3xemsU;P5uzw|8-p>g5@zB$jXU#6VI z8l6(B$%^&!cvndJ{_2nsL%%}Iff{yF4FnmZHRHVsnsf+aSgFEPE_ejasVoPX00>9zh%!LyMDixubqIl>>i+8RVeXA%Ag0RN-2kReO-%=Q~gm2FKIj``e2h*3|9f;yei|d)uSvzLg$&C_+@f2zD-R(O~0Jw z|6PEff_p|O#xQo78H$ZDEZhXHHmPR$Omb#1F#9uf0$QQzq~X(?Pq^SQ$KX}9R% zK1k9}lQR7E=i|58Vva2fOi>_X82|95ZzO)~sh;oyp@Yi!sZbFI^?_?~3<%f9SELyv zaz!f5_SDp$LtK9dDounami|p8{f0~+Q~{Xssz3V9D9VUIHiMs!0_~ly>pV=mfH}Il znL1Vr5=YpH<=}^ujI>Y6R>~9A-a9t+d$Mvw5mz)pQez$~_*6~$1~ok+VGCC<=iyUk za(xGG_hc>G-8au5yOdJKZB4RJ^UVBTWcc#BUg6B&CkjFX0fNoN7HDOkv<0NcZaMtf zdHWv-vxbv3`iL$lM;Tyz z?}3b9BdXM}rj1ve0k*7U86WN51xy$JRPawBMq zpnEB?TLSYoZV=$71M)x{o__iP6oI;4yyZq4eoaD2pw_hU6il35&Mo`>TbQM^HRA!@ zGaKA3-w5>TlYi6JxgKN>+~hI44?r3u4hZC?*r7(%=x>tcJL;I3`RsgA9p?I!CDQLH zlP*p#HXMzy3hU}3&vQexX~Mw$i*FE&zx|DM9{@dfG=`(F z%OQ7@iEWz5#chO~P^#=M9oLWsZq@yCqz?%x)uxj&s=UC&qb=`Fr&5SeCbYfNhxPm8 z|7WkvT|`KH-bTYCYC2sCn$RAQwO(2zDAui23eho0-yDaq@|2CVV<X$WOY4sp@)myWCzp)R$10-3L02XVLYW?KMG`iFpa4MeOSp~N_b*Jvr)Sat5d4z zllCLX0L&|N-2QrhD0R^$UetWfhCm=b&#XYN(xOkD1?q$B^oX8&@JY-a{CQi|1T-Jk z-U;XsVl^iEQTgs?L7Ovdo15E~_F45`0!ru14$*_{2K7Izd_=}CJfRW{qfT9^OMDw} zmzwAKb3V_KX9GkHCt>?5`6Pku+?nX_gC-Y`_#k+)QgV#$8C;f+G<=BtB?sNS?s~I* zK|*eFsN#5``5QZc8@D9!KT=kTDROe4A{53H9)mo^n47yU;tbn%a!O_>W6NuQQ9V@N zPnIwG3GC9PMyPoCg~n^lFE{AS5m+{`-*AKJfi#{hDN0gas) z<4la-NVVG2)-NNi$R9vz?J?KE7PA(j5V>4b=jQwm^R?28Wmf{U^Wn#)A=Tm>z5aR9 zC+ujIDFUjq!^8oZ_UZJPKl9uLxYz+~^^uOE=;0QMW#Z?HFkMZU=0~OFq=g0Gj!>MK zqKcn9f}(qfD{20m8c9=+{Axd?>7xdi-=q6LL}ggHhdzJa=8K%mU(6-uT=Q~k1RKpX z$IK{=B-zV&Ifqd*2yP4X2U(YOCocUcgSr4R9B( zjvVQ|efnVhsCJ16!|d}Z5lokEPGcQJ0*Zni=YJ!Sp37JJ+a>%fp;8;n*4p$&G+Z=6 z|4c1;6$7a8RJ-k$D6Erz!tT`vFL4g~6Oyp@RUKhZYAVL=y$R+5`Mh6(9 zh9!WY%vpIaWvwto{-`SUVn@;bV?!W*@NVnSEX6xz~{HaHS9JA{1jzYLe zT_Vlh{K%iHogq&j0Q8F$c|=g%n$yF*v+CHK(nG`(s4sF0ZDxlRI;l6g#0VmSe4OcL zHUNP5iGnBtI1iq4T*BRk+4K2k*LN+~FIFyP;-72-S15GF7)^;myt)FQW&!~IJ*X#;)ZTgDgC#U3mp*2mzSf1zl z$+7*CK$uLrV+KceaLPFrt750$kiO05>qCuD60I|x`diHdQSbw#f(9>NWDaCN>$Bf< zU4_Ha+eJgB0}Ck|Z}5JK(LGXD`)2-EXpKhM-hz4nksN~q>@df*NU@pM3Bs8@ z93ZD;f@L^zKgX0XCygm|M_+Q0x|f^$%3UW}kS7v)1?OWwLw8si{wu>Rte0oywD;RI4+xB$8cj@unE6B~URmo~z*ijX)~bHc-+8L4 zKQC@JG`PxQ@wQU_&yB~`1HFFh2Q-8&BAIm$k&|`nuN~%-7xx)xzU+F?7PEr4UnL6d z_*Mfu^3qY2FETg`6_P}3zBB7evk!X&E3}|O|9^5_gL?^DP>YX@a+A=z?=tX=k5bM} zi?fY+@+D)U3C2|Hf`s4Z3!D0_LU~*4sj1KrN;43YhXLf`cy0^~UPn1*P;s)X?D;;K zV#B!mnMw#-lA0ap+cLcNbU_A`wFAVDSqM{tE>g)F6_#_~tY3A~p-gVfSih1=J1e$< zm3n}j_YPtSHd@ieV^6wdA7=$^%9_>}=GNRCxa#WGVXED;$*MP^(-DlMRVdaHY^ZZ- z+&~%)VMr~}!$q1)pB0!mAi=K_6Fr=DG2J&Mu=w*_jRY4k9)kJ6U~&MeT3l1 zi67T4KUMCHx~FkbOTG1-U2LE)cYSHX%XgMAFs7QoFR|Ub{@3#@eKARBW;ER;gkqke zj!0*@IZ3Ef$0vFdX8R%wU3FkEpvCf6$3Si%^^l>n+k}9+q%kD$U4!_~@Rs3fto zC2p8-KNoUDTuvhx#0H0h_0?Bi7OaDivGD`nK@_3$AKzUk*~o`sbSeG7J}+6L6tMwk z$iTY$u6d40$*)u(Mi!WBX;gRB?xM@nPoLVNPhcdKP5%uMs~QP!5GaKaO2a!GBU1G{ z(b>^~q2+!~!5nf#{F6YN&+=3!FhI&6D&_Ac&X^ z{5M&Fj245I9>XmbEaX+YF};(D{}dkPY-#M`EkT07t4YAy`0nP6CDnFdx zN%=bS<9etAO5nyx%C;6d00x8jAtM+8XNfMWZe?-!sa;e20gOZ=Zqep!t2@Qyqr=9nwQU2?pkvo9#W8x%Wn{ zPyAM0Q{OLTS%_YSGWwlJQ6Frq(0F7F5obG@(!bewlTJj_oM4ZEir98IG@XQv=smaC zoNa2O1upIn&(EwZ4*Qqkuk_3CnqRMq#q?&OTnl)-P}xm$)vdms+wN&hq0FA)pDvQ* z$>6pSy?eFO{uiU(irP&m%=3ic-e!DK_=5*}{Ys|BMot~WB4-f8D3`tX^MjLZ7i!`N zDv?)=o)OfT=>RTTAF_m+h~|J$4JAR;`&<5HXc%K(f9TjQ!CB=4RtkvrJ`o7ZybB^R z&5(M$^X_%b)6>p+R$ak~q;c0LQPMkd?ct^Q-^uZLx6yRDgu{Zw(?snSG(ufYE{20| zi@<z`Rs^ijB?GNfz1u~(bZQj`Lf%}f#pp?f52wQ_tmK(c7%5(LU zkNf5TGy4gjBd8j)g6NnIx>-rXwFEl0=wT-KQ%JmCHPDH0%V44x%o6*h2Xf1?yMP~1 zu4x!L3j6tq@&kHZs7oA|Q-%SyM5_~Q$#YX{&y6P9XT`yX4AQ&={>STHU)a?+c$Yt1 z;JGO^DK}H(r}YrkmJ{U6hRGdoON*63h3yg9q`d>W5FW@#-7O0ngl4Qx$wUv?5R?^- z)|%romkuw!%MdT;9NxVM53V(r?OO z_F~FF@k&wlKL-H1OmF)GUQxqF7oY#-@_#mxxIN^LZD-GS=Rx0bNj0^rUYJ;c{&$o| zsqdT4gDP!~g0{1MVBOuR^iU)qlXB)D0IX)f8;RORrN{*P6cZRwxZKxu;*w0@3d6_UhNi#p60(NtF8Cx0v+MwATB#vQ39>I6O;fBzwDIBn&Q2YvO;ui)cNc?oyU zY_V;Hjjo3UE1!1Mo@TuCr+~mw?(2sry>PsoTytG~_UbPOY!4d6VvEu^NPdA; zNZeWois<1t2?fCy&-v^o5NjcU{pn|&sVHGOUn*r4#8deJxW{LMVXzh;8BL{$4fR|I zTSEE~hWn}OCBym5l30Z?_vdqAg7SP-+uv5Ym0fHk(itG__(}JzH3MWEZ!|DRaCaoP z1joYg@N=vZnUrSyhVJ7BQphMjI?xnE+XOv;X%G31PJvZm;wF+pCN@|7s9eq6Ij@}@ zWd*+2sE*wQDNh3bIaFU(ybwnSVG=1m4z~cpTbGF5$VMUWFhVCsTAPUa zXbi!D?V9e6e0V!{U&)|M;yA<1sv_fWsZ*}j0JA>jg+#!2%2-E@zjV(&j}&J*##DOV245&u2vyFALg5-`q%zcBZXUG@=+HpyE+u!xUMfxAg zK`aZYWIbD`nxb($zSQB&7WiiKyN?z;9bY|vW{@&M_45%71mPsiZ=Sl0o#2qhV(ErQ zXDRi6s=a6Bob}L=Z#Jv{nedQB6;vQ=i6U>8@n*zl-SLPXR*X5)OV5Fw0<#5@SlS>) zR2FEe3Z^svwGk8B#D;W1C~8+4sPM_g$jR(yU_Uwp$rjEfA;48hv!}sG{%R;r(Gl># zQay3?jO}edKa5c%PF9w%Q5M4hN`n%eKCdES%^Rm`twR{eZd5KE)A`n5?lx$3^87O3 zVE7clMWCaA460ihwFAY_jx$ABOIJ^ItMka|*l$(g)RCcJtNw6Skz zIT0|cT8*4>R=1r-7KGvI#750ja%k_tNLHB*pWH5mNciY@0XbsxejpxAoG=Nn(?fMFK4;uI=@ z{2=%s{m&BF+ZiAtHGo6#DIwlkI`|a-?Q^+ebS^DwOA5rw6$CfSOaWEFXsJUGPTmpb zd{wL~TW>xhz)!fR4We!E-zx>)ndC9|A7Wh&xv4q*or(1I3||Kx=)5UH!oBLd|w+5Ue}_jL&YNF4q2T>r}N zH^b{pFkYHp2o?ME;85TDj$f_dK zQdVcyWja)ND@@e;gBg`FZHk>~fk4~{wIprgy0B}~*@ux};QS|3 zTx^v!^NXGnO*6jpLZ^Z|B3}cFdNP_ah#;ra{C=Cdb)8VolwuuaLj0VHsDIFljA#Ks zD3z5TmD%3nir=beKoSasXKbNB@H<1QAa$ysbiI@1ZWBsO!Jp9emM0QN=m5fIw#PQN zJqZZ?j)AcXjRA4HSVVTOWtI`i#j@U_6{}TieB;$44bjgd4l)wtf({Zy%oL`NlL{nKs02F9c3%@6JDW1M0mL4Dkm2fW<1_ z&?0->&;pA!30qHTqV!{P6kEEmdHQnkNH#$n6u`JB>t1|U45J1m*H4hcuu)jIJ7ULs^2@jm}@qSbuwGJ?8%Jmu-%9vy48OJYj$r1J4fxUI(tBE7rA?$?vR zzn4(uNuROkAMW?`vx|hd1~*n<71olU*=JvQZ+<|j1q`f~uIxQGAamG=(2HZds6REH z&JR@V>9AWj@U)6uCv|l@!7imZ3B2g(TqYHeREZI5 zBmo|67b_Mg8C92v+?WK7k^wTXL?uq|qi4lHsvfU$azFgj5gr)!ypR|<1n+I6WIVD; za=?q>K8H9($`yUSHnQ;`aT=Xe0(%W`R#$f}uz$yks&`^_TnYL` zh!QhKQ}tuQr_gny`tAM*FZ*-tC{MlRtAKV0B_`O5_uXtUehm-UNa_+%n%hCg#d#~) z9#H%m6HXykpmUMvT}FaA$ZBZvcDm>|9>*zS#;souuYnB0=U-}Aj)u#R2>jJ(!Pt<2 z7r>3Ofg9C_M}~-ce|JI&)uMHGEZC;jK;QK#mDQ^7WPc7h0snZ)8k3$gNvb_0YGUrT5RfTdmJ*zlB-{-9t#G-btP=wTT9V;SCuJ+*AsC z`&eeysTjWq=TgbovgHItNFf?Pm&%b}!Y{9`Ma@Tn?_Bz45IQfp%$%9!4}EO91C{^o z57eg#GQ1pg6_H8-8)hlWxK3Er7T3U^_HM$#ae6^|h>$kR^*Q)dXCU67tZMXf(cI$xxQgVS&9Wsb5Sj^fY-U(OK{HP#5?d1ZQMx&Aq- z_Wu?^IcDTgbCsiWMWa82I>Ntjp}xzQSEtI^k+qQK%-#ca&&N}$C;aNEwP}NNo*Qtd zs;28edVwiq{5lRGyQ{AwJqmx*rF3F!gnQyBflFKJyL}|PMGI1>?|#JRc63h>r&3la zY9YT}Tj7QLb9r%GdZ9fj@}o25MAs6nENkpU=Yh(y%SbbJqIH50?GIEH;?BLJE&K2o zZ#W2b5H)au$7bkEdD-~i8Rf+I7lar)yepLTQ2ti_Vj|mIdn-l0XZMI|4?NC0*7oZa5E$^> zULYke=7p!Z8Csnvm=dn9oGWXI6U0REKhOqjYkMRj`|Da(D%MYgB53g5`15bK2xxeM za3&)>e$_dOR2N$CRbQ4lb|RM)t50?CS3nYF*A9Y{-mCwapW|bz8`FPGe{miqhH4m%w4A0ZH7I!AYy29%HZ zKo*%`(OG)rD4@k*QNB_02pgQse~OVj5)}sGR{*^6$j|};Q+qfx@Nyh+`Z}yR$aQhx z2r)c(dV7&4g!GmL+C9L1&IvH9Sn~h^vGb3x`8^%8DgwZA+mmFlE?xqVp#J3r+fLuH zc3Q!9QC^BMy(b(+xaBLE%MI zK<;!M=E&=;hnD~1)Nq;g)vqGj6cpQgdQdDJ=;GT0QO{m`hzE+DpIN-`brK*% zI49|)&L=Ai!JECmY_^ysMnK{uL|+~asEBxJfi9S0xwT|tBZD)p>!gom_Z<-Af}S`h zB}^B#6yK!+%C)fBN)9U5#UW<5Uhi;8ouMp~u>R(w+sj>s*rO}tB1fQJbDPZ-6XftF z;$X0RwRSz2sBW5Zzm)a&O`KiudGL<{fd#(T8U_+<9AhJW{_lZQ>l;Ha@~|d}rJhi( z`swzHFIk2>8S!6%K-^!09U=g#Y5SS-{{FY;J3yLzuw+9(Pb#E!Nh)gBFd6 zK5QvB?&ANO#~Rl;@p42ustA@6R*%@9VZgbvrf4Sj~aRr+EHHT%;?v%y5sLw&6W+2Y!&2mUh>bivs-Vj_mzn&;E#Q!UrJn!dgWf; zwnAWY_}sPvmFpwmd|e+@Q}o9M6JA9Jnc^5PRF&pid#)#BKi26VE{ogfFycxE>j)FI zbHb*A8G($UwsR4T03ZjXxf_3QUZh4Pc(Z#Iy+S#N5o9mJlvl=&vc+jct4O|KLXd!F zxQYr$GcH4v5$GC`0U|`9+W%A%$z9wK@@4kTZ=ZGNnxlF0zwMF(21eHCWCLkfYpDY@Ue|#m3iBan(rs8}g+!VtYJEnK< zi6BC~{4I!=)1pP;)uyv_p25?6#A2r1Sl@cFVI>bLX5(J~a2Q~NSmJ1!io3lrXp6~` zTHrD(U05$jNhfE(8_gv)0htpai}}Oqhxc3P3Bb+fEFcVa7NxTO+Y7Mc_Uyqc$4^b_ zCpGiO|8z7|uDEZ5XFFqz81e&@VB}ctDz&M`oIfWiV>bX7vlO5UA_$m}udo0YAjYbf zY<{=YVIF;Ms{gej(2&`EHWkDzBmQd|;~?! zzIV7zTiSsRCtqq!0S0UpgF32&sLwm`6MUsuU&%+xVn&Sk_-|?bMB!XBN={e zjcatyFBC}9hGpIif^G<{CD1YJ=6slQF{p!<63~B^a=Q!-7S(=wRp||^Y8ho1nQ2hScrMs%RzC(xyqfIj(0y3R`W-+gnzeBi!~ zDtz(tuH_oh&GQ@5``O5YWgdx=1nbB!2u5V8oTch949jCV_9+|`;8u^;L%QHWhn*l_ z46grZPQn&`9^J>>)NM`qE@R8l*p!DoTFUs?eywUdn)$J+&*c2y{ITY`3V9wzSo-cO zBAip;^VfVvxTXvUDcCFzF(Hr%<5y5Y*$3{V!`>a6ZFerUQfh^M6ffQ2LT14AMbE%3 zo3-gZG5dVzW63W;|8yF#CG{^Ff5ZKxi-@MjpegUfp%qq>>K`tX!5;^Lg6`IV1;Xf%C?U-*;9=$e|O zyg$GKYf#5WkXs{XN*?+D>8G6|FF+I&Ctyb@Fy*j_?dZ+r(AX>{f@?jr2Mf~Sj;X3Q z7l)fgv~`Q)LXW{3uE?sd{G!9a3!i%4g$y8pW)vb_>~UN|rM|_FqosOJ*hW4KVB66o zqkEJ+r53v&ei1D(X%FdDw$(X9r(%;1P>y5-r;C7V z^p8LT@vG{O1^l~)7vdv6`^X`T1T`cB?4iG^jns?2UdIVJ(NPoiii2A)U%Cq^j;z{A zvyHjEfr@wf`oGJH(A*trNAB&|T(mBYuMhvwP zVC1zvYxu>wPHY1I_MT?`-ZY}T$(;c{vRUapjufoMTK$1vN2^!U_TN1kd`&~%gFdmc3>u4Rvuf0hh1zoCIAWQF)nW2};8BTsX`r}>R`20rAJ0L&CRBE)x!s=$E$}puAorzS5~(k(-}IPAFy5{8`VeP`5D$xuKSk{ZrEn@a z^}UyO8$XDmn%te9rkRQeY`yJ2h$i_Slbd5)MwblEZ26`4`daj+@j}k`QDSX^`}mwA zDQKWQidP(fK$4Yk<%~F36{1mV)HUoi$kq44uptn+5^xbf?;4$}9VJ5`+*CAFbQ31@ zuh;s_fkqEhFup1MSNpKqlpgw+=YQ22TeIoc3FPD!dZX_dts7A^UN z&*lJGLhS@^yyH@8_8Ur??gokbFK{SNV8<+RKt$!k<8_gvN>EaNUKQ!b1g_N+2-};H z-X>Y+mY#gNlRE9QMUz?Y{{J55Ht3Z_L1d#~`!B!R(o(VxS*MEAI)XsvfB!{Eyb@hI`ReFl$y;_}qGU^*Ttq~G zm7FWq@iH>l(Fs@m=winhX+nR4377C)1%=z4QXYeTknWQRfM+*m@I?^ct3D?E9s)Q? zMDOubH5EqT>z8he*u&&|5A=UnJ_`pnetpyGeGf*`tRfMBt?0j`Y0A=$E$bn|tpv;2@Dw(j;(R%t?^R5eC}2H6>pQ z*?7Kmn&qDi(q?hM6#{x6zVGyl1|6R^c3XTsK&X3M3j<#Ps#hm1r=pvhyOTut*Wqw& zo&*=G>@op{?h__cq(It32TD&EO#i7B?h23cXlX(%tYbU1p+8l+W!14qYTg#dUEGY< zz*ow*dHWBjAU=&vUewH?)OG=8Wq%Zlewz>kYp>CYjFX*I4{O(K&kU91`Ja*6EjRt{ z`tkinN`G#f10SQ9hxBHEFIPGA$IwoqvK!!ZCJiPN6xF^iw)tDV!Zf8uX>CXwceNB5R zY=G+dm4LoF?W0tlsjg#-))x_v@@v$4z2lJ~19!bmf9mo3Z{^Qm-c{AkezLRIvHO8X zWTv8>D^=M0i_fL>^Gzblkg^Xu5`;3JFSU_S7YK0NyYb#oV-4oo($gL4+`k$42!LD6@7KD5jf>^K= z9)S-eD`w1-@BYc9{gqpPizJX@_c%sKn4%uXf|TW>dsPa{qB0}NYmNQgl_3jD_(J{DWjLK z$L_tV)xoE_gb<iNYe^lm7Fbt36W8EGe_=I#;=sX!y&Q#EGi4iwr;d=&=iP`ZC z^q19_i$&AN&`a_%TM0N66*k_ZZKej^uh!jmMng(VuBL|2jP}jwaf4vdY7NKnqY-Ho z#a#ZeQZkSqHfkufm?z8ISj z$nVBm8=lYl<&b>$V)F2}-}lHmo8Il}{nXkkjbsUwI33A51=E@)g-_>RIFhI|QRyTV zZL_3nE%*lea#7xWV_s-pa*v&o+u_k0<{U?H(w?|yX7ay|+QbadA<1q3813da zwDP@4e1776^O-gK&8B6_U2kZ=cya|-!?<|3V}d=?#!YJ;8e0t<$q`nWcyx2vrZdS# zPx6qbE>g?lVC*`JYi;DAV*%Ijvtd#kM~5T(tqm9t`S$(;ffDvj{T&P3``~?j*Uj#f zOT_%l4uQsza8AqVSUp0R(7KsZ8{=K^Ek_uO#{){7tb?rVSDA1n)3~(Sr<1^A4|tbs zp#t|*WRiOGwl6Yka9=r7S{MCf0W*$U=2}By%1roQ(68&_mOm}YoZjethB(w6jB?E) zoM91FiIO7zE{)f%9~O(aT6{O|d4tcCd!X#MC|z()L$baYX6ALxz%2Qc+}J|RyDzO> zls5Iy$y+w1c@A4b#y9iObLEcSj~;3?IVdou^WOzm01Y#=(_l5`S*A(ZeUd^ZaRu|& zHxS;UI;|pEy7FH~=1U7Om8Ch0kR|bw@Bkb7|8)`Sbs}WVZJdsFn}BHDwY&&cp|%6SV%p8%~8;`gGKALN~-C zI4j7+#JMaKlZmzet(7-E`~lHtz`jVy5{pv1w#)>Xywg&}s;tM%HyGKvtOOP^=4po` z&kl;bdSLY3?p8uyJ16#M;pim?O4seUEat+;s6I|8nqUTu|9ZiExD;zHd|Ux=WJO z$)8!$f9|%1jQ6beNO!H3sT#3;m3#0mdTkAbakIr zBExSa{a73c;SVUlt9aab#`pGUT?y(a7LII@E#{L@p3|!|^L=X^vk2mZFF0s|=lLChFxA_yO8y4$be)s%H5 zUx1cH?44kQ9_QGoPO&7h%_2d+&2VOQ*Ni|@dnP+wbq*bF`qNn^+Z&@Y0*6z#*8;}5 zNM@_8!-hYSry()$0Q3Gm*KfFmq;G`$phDWgAF1=XFcC0zk5Q1K%xd_a8RK1EF*W{& zlfBLX+1uble|T+ivyc9ua!CV!?^Z_QwsPG&L5lyC8+ z)@c5R{|D<91utkSkC2WP8p6L~g#^k^R;-@;M4_m*UUTvv$@uE7Goh-VgG>vwYCu#QD>~J0W#~^?DQDa8Yn``N1~t zB!H^pu9|)Fm^zLl^s@fK%j^jOv>WX_vxk$N9Li3Y%BQGM!n8IJx93b&K}l9J*_2^P z0V*|^rFGV|><2#m4`1!4XY#mpuN^@wi{*40xCc@yX##|M5eRQ))WZHoHrP9jU~yBPS&6eQoUW&WwP)Ar*BcQJx)b|Q7I`Lkw+~~b#m>HNefBl~<5tUN5WR); z^)XxsZAf#nmVA;tk}Clw_EwF|>ASwzIt_VV zLk3;M_o_rUtjLgP=855>nBw>kh8*H5~u<^4KCaazQ`%pxJ&&M;X-?nc}# zmj+NOh}46kIzjhL(_osSUN)(S*_NbT z=%-W~ro@T95OGrp{o)PxEd%RLe1e+^s3ilbN)wl0RaC6L41DR7I0^c;fz2GsHC8&) zBHM!0a+!y+l_==6Z4F~na%H0C)siZg3wxvZR(qR1zXA_|6Hd*54yaj5TxdS0AZBeW zR8eItOXQrnc(!-B`p0F({WK2O|6%8x-*%Gfj$hvF{WN)Ct^az-Qv;{K{O^Lr)AkMCaHViYo zvaW}+6MqZoD^$L@-H=#KzsmYYYjO*)_mBUtz3+Z&s%zSfqM{<8qM%e2P!R#8NH2GVS1YvyH&7J_nBwGsBOyZN~4o`uQ>)WS;$RQW2CApB6b>(Gl>07IGG_i@^Jy zex*)GFrv_dG=>2(V1kaMFG4(Aplr0-2yC0Jx_!hN6@e9z6zzy%1K_dOGVePM*pveE zOrgGM-o?`#*oOw43NaP$&nrgFAxdhZ*qD}16a01_md(vrP?w62GJR{)T@}4rn4)_Y zt$qe_`^)kOA?xup1~r;9qK)?7H-F?M87t<>wGNVoT<-tun|9fT`V;bkHQ)P`?%%S> ze)w-)vdK4EtMmaAe}*^xGx_t^kWtus4gpLb63J)P`KTlyh8YZN>`}?N-x?#`v`+TX zRUlW_?jf9CWH{?0=ht(?goh7rUbgb*ECl_yFHh0C!>SVi~jGB~R`6FNnwyioM8StCwS%7IL<~ zY~z_ZUfXRb{Nm=hE;ZXcZC=yH8P7@dO1ZN|E)81$614WdhW|2*$8|szrpJ~aY)yUa z#IrBV5nTm${fJv$x>aa)5Q0%okqniGQJv<_PT*eJ!-%}&oHm5<%W;BZm*Y6S5P0}I zO2~WTjfTQAT*c+K1BXz~wnRM2V>@~0*GEen3RMxrRqv63xF$;fW~Fyu=~2)%C-Fnh za60z^qy_?lVrGq7mO{72m(5f!!QW|UELlxeQ`2A zV=q!g&wmXJ9YLKtCM1q3?_qzO?_KS0nqC{={AI7k-?WtcebnxzkNGGsaT}j4E>eUX zTBwL4kJIM;Rm_os*M0~uHy|NYV`wOA?$lk#f&BjTpAaCWLG1Qp;5CE_ z?s;_JN`{hm?Hv{P8^NawbXCKGNB4hhOEvI2zo_|`;xiDn-qymtxze!TuR*3JNPTa# z{Z6mtPZr!zP;Q_yQ-9gD!urEd>7DPFl^&@;?tX7^M?pyW91RY&KR;(I`PyA#N5Rn? zQ&01DAL3$qr@r>Uhp#BYAMM{abnKVQ>x(Z0Bpk&S8Ad=B&=gixfADsEPGGURMq~AM z+DplD3A0gs1@Uur6t9%qE(l4rpCGgdM-ZlfL+xvR{UumiL-_Wh{w`G6;LrGpRSUSLTP?#P&a{ zobTu<1Ojm^ktQhd0cXz6(f9n4T*{A-JNlhBiH_n+#D$QGQc(RDyFH?F$+ptks<7`) z?*lErb*W^8bacnE1zdBdzi2 zESsdyg`Gt*E|rPpz21b?{joYp;-7l`&t-WdPey0i%}p^JT?Y5$aI22BBP<(n$P4`6 zWwS>|-c|Xm{P&GXGKWr?g5?2581hLwj_Lx~9g z0KnT1wcq1D`QD-sbKjTVC4ZOIcJ7Cg%s5-K0O`kYk~Ua+cDl-4JLk7Lva6vkap<9V ztB_Rt(P3-5tGBs`U+$35cWQJ6zdtU~2j4EMTqYwZNw_^eF`kM4ep#aNEgpnkEw3*| z&jn4zBU>A>Uy@{xoQ51!@&q96kb_`wenh|BG4~M!xH4o$@Pe{0eO^NzYxNDAOAEi6 z){Z#T>@aLvpqsL2^Dn_qo0Oea-%r&P|MsFL@s4^m??ZTB8&M{UBk!VmIbYD^8nLVmGomeM!(Ui8xykozjk2B* zJ)R%Vr9S#g&Mm1S{)2+%xe8PA*IJh|^#LQwqn79_bo0c8JrLX5<`OPNj zewXD%$YX`9FVMuT#Th*j$a}A{>Y2YlQ>ZJ==WYhppyrs(x$HNwkh{fcL3O#A`XTe| z+H7YsJ#a>}g@_RS=c~Q>`bwPcvm$=0W;&#eSeP~=n9D~8T4ZN9iwpfe{hFTcyPJg$sn3nc{Fl!5kG}aCT3smK`fQ(`H8+J;tkkfm z9eUIrWJgf^DOl!rZzNYRmetJ*JDA11?+Gd5)LwDz1gJROHmRWVROZ zh~!LlUJy;cC`GgI{@9R_&{zy^mXCa>2L8p{cTOjH%txMSJ$I?0qw^T^-@HW1{P@M7 zB@7=NvYaT_nr1=nnGPeyLmE-$dCK5DSO^y&da?+YxuA*0lyK^!g4gG)%e=)qjdT|` zp7jL0FO3cGfnjy7WzwJxTX_Ynyd&0AFIk3dET2VBA;+%gn&px9VJ2a1ot=#x6FT&`pame!qPc?e%_*tTY; z(KcYd=>19z@^)^<7r>kBfXZIskz^G3FO3pfD>`>)xmb7_~5pF`0^yY6aO`?tmU8vhI?He>NTl<7CdPE+D1Y3 z9Y%q_iMULgyUWG6+j-eBf=%g3LN@3^^SypiRPI{-eE{r~>1X?m#V%6#>_2nA&UnWM z$-KJzfdAMvQH2)z*wUBK1{*=bfQbr?Pw981V3>PTpadITfvx#0pF-1~O5U;ljqW`A zgwpI~n(-o^Pvth9TseAf#$C%>=VZ14gK#W3ubzNArTp$^Xii~3TFwWBeck(Ac0=Q<)E2bsCKO?4+>ebk@gX$l!HY2A;zv&V6 zHjHcbXvK6d4KiQx*%!XgWglXrA}KBPn9aX)U_EX1bT|u-DqI+|<#<^h8oPlhUD=`s zGK-ZxbhC@MXLWQOK0vRcvKF`dzGFIx!@AA}slK^=qqm8;jgFpqViL{fXVSQtI)_-m zpvmS&(7^jzN6}UYvBXUfpK3?BzRGLNKD} zI9-+ST^hxa@A_NeENv~t>v+EdPq;u#$AGt?zUQ_~y4&7mpkM;7|L~7R+^;Xx?O1{l z75y2}!8qf~BrnloxFYpCw^_6>rv(Hg^hoC+->^uq^QA`{N8}V?}g9DE+t8* zFRr`ij;`lj@6Ey*XmWrhS*HLE7Iyh5gt$_;y5-eXj%gVi`OADc@H=9YB8#C+kOouH zA7%~c#A(SQg_%q-R;-@?xW)xRb1-gE1**Y(5cVc zB}Z1oEAXNo$1VV&cjx%FyCvSj1}n!P2F zu=h5wmP+d@-+4>#2@i`|Y83v>ua?B^Iu*A`k<8G9<#u)(%c0ZX`p(()wkA|`oMG-` z5(NwiW^ny5)phZthq(kpR7<|5`#(5EI^1gK&Mnz*G5$W+`e#Xaxsng_VHh+DbucuROJZdaCnB!RcTlq$L zG}o)2s|b);x0MRYHr`PbV<*A0;RUs$5IiUU_!}uJuAZ=ed>2*e^@<%O{E93}XW-XP z9s>W}+*}dy<=$Of?QRi^&-B3~i(+>1bd=BFf;kK;t0gDRT3Gcjg?CF}?sYkp&cxeG z0Od}&L*B~D00}%wJG27iET5im#N!Cl^wo6;LdUpsfr~Ws-NrwUDgO3`TwP{obDSty*QbfLI{-u;lX=itGO#t`ljLbJj=5_S&LoxOLMy>O6px5TAt7Hm7#o_q(= zp>&tR0kgD0Am=$q8)MY~oEd2NXxd70a?H-xF@n(1y&SmrYTq5i8oU$u zL&_q^&Fa1NrT`ETC6(sXFB#e3Ljncba z%D3F73uRkZc@7qq;d1qldivvv4v0%Rssl?Uxz_89LPBZMayJr;$n)uZ$wlA$Xi?9l zh;-50UT(_7i$zt&8!2$M0`CQ~VVf5wMGQh9jqLgYqz+M@5Rvckv7(G>bLyHdDtRf7 zeI2n)3%JHeyZSfgZM)&9-HO>LZ3c(v+x05l$)Rp<_IV_|9VhgS-@O4f#q?-0q1LiY z35T?jdsl=)%7LsPmMC?#H0d!7wH&2~d1B~K%gqSJ*f;c-nc_s8m$sLK5)yZQ%o~pT z9Rx0Eiw|f~B#zNIbv%`06&kp>jr!xNgOB73d_P5g%_<#DR(|SZH3INL-Lz7|7M&9l zkrf`F6R){CLFA6a0pg6JNhlkTJ2dkQ=;#gyY`~pchb` zdz5lqBlKeidfS?IyqDm5l4x0nofX&BO15fmyShYuJv+?k()oIo2?@iMBTEc7s4?S+ z)%>|JtgQ0{Bi6fw`#w^ zbyz$rDSjCy*QA=wj%F1VcMy)bhuG+Dw*^viDe46pR3{Hgjl&epqI|^m1?j}Onm-Y} zZAa*7KUU+xP}*uv!#6z*Sinet(1Vef?1EO*XMh&b-zn`(ouS0V8_8@!h*>Qt>i0E% z&ZrWupVi{3c??eYu@6z^#q6{FDBQ0XrFjYS+{;=Pzv=Vk9&Rrd{4=WeB%{Ja0`VW0 zk7hUQpq5+F;VzeBGJF!VnvT6gtM=M7y(C;ul>se#LnV51WaimW`Bqj?+S?`MSEZr8 zx>q+X3Q`8HrG4gM8Ajm=7F05VGFYsg@i7zh+huQy!+Mlz3oAY?GI*)QE0-dJHq34_ z?H+)-ZElv~&dehLDT!yS{4C;_g{niJV}JBBq|edmRJMAq5(yK^Q#qVw%5KjqER0?C znv>*(s;T&s07FFeR1tryiF@DK3bCLPs%A`1CFRrG?}n!(zFo3BzD!cu$6(Wy6)SJb z*Ygd`PGsZ;ubdIhN$4*4#yZo4B`De?rVUjmHiNvW=Kq;{UR4q@c^z*=)Ag?K#Yi{% z!|e@osM2h=6Ts)hq)f`2_-BfaPbof9u%!>xE3Y_OuVQ0nj}Yd)>xL>DnIq1(a+wLl zsGuF?J3uJCT?@ZuTbEd9Qdl=o|$K(CSK(9P`C@&}% zyJp@H=t7$@V=PW=QlA?*LdRMd{rq^u%{>Cgi}KJkQchE96i`hPk8uWgiE{oSu9qk9 zwNQcEyckuKV`N<5Nn)CD36dJCn{mE7_yF=9vt#fJy%06K&>xQUIp53%3wizXKxjBc zX`t^>sq*lCg`9z$<~2wRwd%Nu3eVm9E8PKFI@!5SZrK}cMoThT1^JDG ztmg9@UPErVc^241?E3Nk?V?mYYvJ7Ab07sP=|_2}UFTOSxlU)WLRlf00V~Tej{=C% zOFV_Fm7j{%tAOW2VAl&f4Wqv{Op{GD^}QN6)p17D&Z@bx?f~!%6D)P2sBMGDOXLM- zFJ)LJCqq~9cLlF-km1nJD;hM)d6ITs^e@)isY@l4jPUqCPCVc+W>1^Qc%%OqgeT4^ zW}0%dkXrj92BOLqI*rDmdgUwWAZ0AYzgvGlOnN3ZF^rJI5r#NWh^-hLk5pjgb$~` zQF_GM;O0*PniGZWrE-zMEBiyzw#FsJg|^@Ltj}WHr-tTN44Ei<-&LV0mOCv%Jx(qw zTPSMHx!i&2T`#maJdfe2pwTnj;8y%DZNCu$tyy?YbJ>-J!mn*$WA|H+@66lh2~_3U zMxZW7uS|Vhr-1?=dm|V2yO9-lh39hmV+~d&oTHIhX}``oO!Ali-^on0JL0&%cJK1X z>R;su(_~gb7_H@T=Hb!=BcMGnH31w3p{@>P%7itSo&abDk%mE4ZJRmU74N-|7TUnA&8Y$5i&Q(NK3$bjT z#W2WT%_NQ$kxP$z%=)vvVLOLD5|J|qFQh@g=%GKL6N0ARieol@pz*ZVv9E6BR8o2g zqP8=?cg3Juq9)4)ojS6LEz2Fr?nlzgUAMEgFI7w+UlpysaDnb$N;2`?am3ozkN3{% zOt4#}9EC~Sm>J*94m;pUbTwHA@7W0|KD*x&2e#%`=v&})r4$D*40l?$Iri1$i=+h1 zGU|S63kg+bs&Mn(Zm>7rnWAnkl&o|v5&-IYYXmXwveEapx2={Hgo|R{LX~@<2(WU< zI)4gPUxz9Gj4~Ai4?D>+Z0B+MM-cS}XRUf$R}3s(9o3^HasDoif9?_#jI=5+V>}#9 z`Mr~~v(cA-A zQw*WXm*9(MP!j@pDtI z1NycH*p^zQ++bLQG#Kr$X%N83R!>XMAXR#@!rB5+d?FsW%=!%-eeg|I; z+IMcp_yBO8JL#`jF7(&Z;2wHD$)-1m# z;#7%5j~VN2es?o}Z&=5Up`LxpeAN4~UC`nosHq_-SVf(fGN2(#tCu7&>QrvT_^S&+kNHCaT7FOL#$g>^d#Z3GN z74?bX;wwukdE~HdHkLtO8G-YIUAKPXE$tEBDFO{93!Y}pMGos*ug5~2(i)UpZrKcS7Y`8eT zUXzE)^`=p@%n?ec;0}XbKqKo_@$9e=OM02 zYms-IUc5YLcgW7VTB%gobIHs}4jd*k9hkuV=p8r&<%RQjw%4HNTs`SxLTu}#feWB} zmoLHb}QtD zvohbxq*EN~r0BX}s!J0Vcc^6NH|kZUtT&1Mks*6;Y9=gFWvI4B$Q2KeuNlpal+gXynVT?4%w;@Wl8O&)ISt9|0T;kr8 zLt(#xrlQrPX*wwTPJUyqcD*&gOm*qL)o?;M&~k|FYc|DcH_DZp5A$0c7bvaoeY;f4 zd^kTlMgiZcc9CV*zl_YC2VIW*)T)H2na=4!>LGpowjULp5#>79&t4}2(`qHD*w|#(=pStE%Ybk}ORTnR6C)I7wYiD$(kDUf!y5^`P?Rw>8mj&HYUvCr6gS z7h^U4gKBEHlS%J-CYcoW;x(N8H3Ih}{>=+2f#oDge}=_x|Hv-W5%e4O-9b zJwry9`tSJey!BEA$wY*XaCdL6i?4XrI9}uSg{70v*_@7@f%QGA_&B1ey89)!mQ&w+k`qn*xKz%iZiu|3 zeH!LgsQK(hE?LK+s?J!w`;QK+VMW%Onu}Z%vl_70Ezc?2M?${MeONm23#u|3Bes7Z z1rCK#Fp<)7#B3>7Ff~tX`>kBCwwj*JghI?W0yNMl! z(au%qaa_Nhv_`2y+ec@!_I3WVxqZelC znYoKq^b+vvEdEMC+`HM$=txRnz-}5#Q;RZz@RJu5DofBy-x*jsbFcUIKRJKT#RIbt zm9gsbloF=aP+V*$lqATgvxptJU%a>5U{oZY3NyE!HcbZs z5}H}3ZVVuoK2{se9rVP`CG5U73=#PJnx0aW>mhdBd)s85p_B20 zvqk2n>?yU8U8=A(+|ToHJ2I)w!`-avvaDBG1NF4E_70YBO(kQMyit+U;14yul7NIL z5Fx~^c*MWWJd0{$W&^L2yG#oEYF7mSHT>r#=+Tv^;#pkd{HV3TY0*J&f&zHQL~%yL zgOogG%JcmNir(UiRoSeH^mZTEK)xJ0bOD1^MXQk=+Xk7hEa%nAhE;6kaAW>qRq|ha zVv-Xbn0f1lQrOy`-O%+o!(mXMKXGVQsVQj=6DmClj}~3LxKiH(>JejXu#nkQ(v-&h z3~|>hdW~t(G95thL-06>D9r6`pQzGSp~BZtAN)KA$TMI9TGzRwPi$A4$#py&{G%du~ZeMDq=3vk*@ zCd$o)X)*!;F|hb5txIsPFX@JJcrR@fHucHPfpgiiT6wLyLqk^))UL}y+2=vDHgSli zt&QvnE?(#2e8qSXPq`|XzYSm!t}|&mK#^EdxW1K^e&vmk>CMU_9{BB95M5^Z1NwsFJ87AW|}KZHqZ*A=?C^TIg z95<9P##?B-<0bIx>b~hk9iaX1r#S?==RlOnr$_EAySmyg{@j~X&?(7{2Bo|1K*20G z;hj?fv}`TCv#OcxNupm;eU0sRcE{>)@VATLHs!Y@6I@O(vnc5NYl7OFvAEHtF>9kD z?ox`mw3vv?3Im=p^5P-~Z-85Zb-g~++%aKV4CJmCyYIAUT^HSRiB70qrOy62*YRVcidK?_tx}v4}47F3(b|3>vgULgOcQXqMEQ8}9q{!ouK%VEq&@SHkmKhkE z{UXKOXSLPuF7-d5f*|)2M)V7K%ePeUp~?5FqfOKQFXDs`nS46%j{R{G)?~-zPgs=5$t`$s$!L*5fA=uEUVt$!PleDSKmg>AH1>7Yk6~8 zU3_)x0g%pbu|qr+cM}fY2T^KfV%4exA2Ld;#a*I$4eQG6@u8|{28?#z>%zdO>lg&S zjE9(#nGR|=|F%Q86N~#E^WpN7k6#~Q46OKkF4zBg{)JG-i7b|+H#)UD%!}M|Kx`~b z>aEdm@X$lmj0BV>RBF9|MR^sm#a}dy@z+vzpQqe-{ zB9nmcx~TlpMftQ|8;<6Nz1SUx$(?v3+3ItUPO4ob`MPBMB(g)K`mOU`9?{W7S)+Er zt?vl%N-)^tLu<}lE2O`YPkg>T(W+16XK?aj5-3g{N$&C3_U2I)tp@377(dIQK+Q{- z;d)$t_#MG`qX%4PNuvlJj&g6(*~tXu?0uCm)=u7D=$!zq3?_Hz!s#6e`Mt#54lZep z0dOe(zD7j`B8Jo;{jkB1(@cJ^yTdOoqr@lkj$TX`Z$8961moX`H-Ze5lu3nIH%Hd1 z5ZhHH+-7p|=Sg^)i3NJ3G_adAjt_~O z@_GKfi-Y5s>FjP z6;%#E_#x55wn>Q=Sf4IeV1hlUDtwl!>JRTfsD%3DJVcw03{V*)F09G+#vOblDO%ql zX?6s^v`wh-HnU#`CpIt3OVW@mAwxLKZdfG`7!Ax)-W@rRmAR_B>;U9R@%TG*e8Jz& zBf$v#3+t-m)>+qcnuPR2&6qUB+N*NPM1mg+g3VMq{6e=M(})NfT>_V+ z_DL=eX2HI_i~pMG-3Mv4Kkwm1a$v{u^6TOTlg3P8VYT-U zdo%?@m48*}P_x*9X7Kw8N&A0O{W)og|92LkYBZ<@_-Dh+MhFUF5u`*rti0YnET}>& zSO&UwlZB)JjxP+UL=8bexXpVB+BB()q-}-m+!Ty8&YZahHT`${!B5*G=;_1w95;C;4}~^V*O|6&|RKl+%9_UC2g!FgDVXy-I?SJ}kLeSs; za~YDt|IaeHq{0I&;HJ5yWrneamd1pi_omGNvkslY-{YDLpF!=lT?+Sb#Xk06NM))|mrGGli z&+~DQ`{U7s#o*S2SKKq97U~~a)|x7=Nv=lssZU@d{8rrAd@U?3Eg$q8mv=ZrMa8{5 z5lfml31Ixi8R+R3A9H7-(yC<^ZCWX5L*^VPUcGu1yHEp4Aelk+ypCrsTL2tJVwC_t|5mK?XW!)a>BJc4xgri$%Afz3V5Dp<0>^r+uVz zRhDARhpndvH$Sa=WoLb)q@t?1`ptNlg<&({^XJbLI8_L+VUEHiKxGbR>G71;XuhXP z;G2Qx7-!ZaW1E0pc^vp``&G2}*03&sv$#-uNmBcJ)14G78Vlt$Hy5bM$;nY|PIn^P z)}9j!6<98B#ydMWlls0(YRhKFNa5lXmu?X5YVYXiST&cd1mxu;U){Rz%{43B@eFd@ zBcyUWjFOTPYfUgSJ?$iJlqXQ~axWd$@y+!av7@W2ivg>D4z%FWQ+pM=(8zzSPcTAsv+9!#7%lHS+9vD$&jZx##J}z@#ZL zh?&(IlH5+t(#v;ARW&z*%uE@PENle5pg`iMl&8=zJBL$E1_IY%r;PB}t-}4J2n5z< z6tQK&&RnDR*Y%Mi_hf-c5$owdNS78X&R!4{652JqC>ge2u{Vf=Iy($QEbrokyt98w z2jBALKNY{O7epAp1Tbd6W$SVMRLwDJxTO)8E;D~u4mn9%-)KmEYZ>90{21T(3; zQ}l9b3n4Nefl0Nk|2wayMvfcp*py+81**B6vu2&I4^(o>YonKZZv;Y3s;8v$U`rk) z4f}a$mvb4IC7D&xC>A~Nun1V~5t5t-)Nkb*XD{7S`?z-*Mj zJpC@`FmV7^o#Z8NZ*AUi@k6*LIk)cCLX-{@SFl)>0Wb8kjd9z@BfIeXw)!^^l(Qogxh<;3iZJ>r?7m zfK~>}c}XIl{JBZcXw`Jkq>FEnMuWbFT|5kYLO6}4YsS1t)XMU4KrghiElu?Ec6T3@bExaawPl3YMRAY#9b4OmIsEyk849$o)+FBy-@igrakgy#r;R@nrs}L@_;b zy3Ey2GgzZ_S8v?VW$tXdzg&mnu=4i~gbWbA(dvLVNg;j4=8r$Xu4=f8llD@KSfI&oa zKO1kQFt9t=_haTVj?{SGt3w9Mmw;p_-aRYjw>*fi`gqY63HP5B>kKo&A?-_XJtF&v9(&al4{oGC*WqxTQ*ZW#j5IFM(?7*52fq1A%*jiLIhEyz zvlY9|B+V^^ZfCLG82(e10m`AtDd0aQs>NNag%(R!Dlxh#_f*V3PB`vgZy}b3Vu!yi z>ZNJ4+AUcU8?KkB5-%&5X|4`qOdk==`jwlu8x!$u|;Y1D)2lM=;8VggGuExTd#PId&ZtGZ=Y zq((>ho&&yWkQ%9VRTNe`O9OJq%SoFrux8|3ypgob=oc$I3w79pc|DdM1WLDZyi7Hx zBvDlxochkY4y~RqH9{tUtlD^J;!Qqp=j)}^cD?c4R}#^mBQ|o{pA*rwh7xE!_rhQn zmmqEsSGN*3;LFWa_%l=NT<$nK&Vs{S!sKF)7V;8Xw!T&qcRO#U#eBNh zWp$e2tdNWNREwpw2xk!|;bD8r_q{$)=D-7*b}ft1r9zbA2FvKOnAunnJevFV?8PD7 zQ|aRRfwGO@9pTYHEBqy`t`Z1rf}JB+P2}*s6UPp^9*nz->!;hzO6K*=_V2Wt5e;m_ zW6OaW(Sy~&Bt0DeSQ#hpblpc9Tj!Kf-Dj8A!w@PgmIMQ1YX&{vzXN-A6Kn&_D^jDT zgp8ILlP=f9)?!*It>$2pV&UmAmiu`F!zyR z@&|@C`_%(pgCfIW+M4V%qHDT@o;s7LweJ?~w*mG9?c0q4f0ti9G#}&Pp{R%)Xc4)5cdeIwS+)|QK z86G?Rc^4?8Bv`bsxgqzv=8<#4Gj)XuBcqb+TChwyItA_3Ymz6Hn@a{*^Tc7b{0AF^ zHB?>|2{l9MSUJHaU{)!d7LvM(9Dr+Lck$Z_V<+oWt$63D=rh zv-OQX5aM>sc% z64xjG;d*6NO!1@)w_evTIV_X=SbXia2+(s^06uqe2p!wgVUB-He~-DV%_=fo$6@Th zm5j}J|F#b$SZ}$>`(4acO$X>Q;1#o7;rE+uW$g^izyBP#H~(I0goFI-N)-Ij;_bFm zBh)?Ze#9l2vrCw9_fEiS7a@GWaB%Gzu~j@m>Mhmh@B22f#Cx%n``MGH9YpOXjE~Xf^Q3XV zTvOl}ACgIKHVaqjIiQ8Xo=8B$isrl-N8^~wS!TR35^cM}tJ0gpJUb@wzp^cqckp<_ z+2+noKt_(j@_C?TEk}}*r(a)4m^zTiW8vI%M0ndJG;A)LxO+4F=-t8JPRn`C?3+|F z*e3(p(m(@v+nIzC2*A-}xOd#;5msSX0mGfWnb2Q#iP-GX<)3S%E41P^fh7Y;+rOb4 zuMI>b{NQhw*8_)*xCAXIJ@potjS z)QBaGRv8ZSsA)fB1RJXQPct>rM`gpWWWZC@!)>Z0__QzFx??YD$(x7K^)xL18l=;aI`lC0C5=$Bz1pd5+ORZ6tnx8obt1X4JzxJ4(4nTF{BWLWd zQEKF#hK-@^lRI90MQ~9kO0m2(1{yg{K)ds)q5kiD)@c0_66Pu2g@~W79Pp}KtTdX> zw^)HMUh{|&1`TLPUDAu498ye7>_&X}I-obF*(dOIV934J>gfqWsgRyK;P6!t<%HR2 zDa52xNc#EC=Cau2pl!rpB|Dyq9OG%+uPsI)S^N8vc1VCT)rT`RK9zxCQ%jIg#TG2^ zX4#i%xih5PRNUTs;5iKlLh|WI>%R&B+jVqV^*DOz^dP!l7*rS-6rr&gzPc54ZfyPh zaX-N0%DGQ8l>6)URh@<6)Et2SU8# z1~Lc)$Sc4YR*zo=-UD9~;Vf$SSRNU+0ua}`1^USa=BS;)FVu-rYUHmQoZ#xCYmhqU z0Mvf8VQxYwAMoNe7YB=6BQ4Uf137YLS=sG32Mh+06PRtJMr#>cI@vHVh|i#%NeiPh z2xj$xuLpidxMePJWm8EoY~Kbs9F!Pbh9gZK{t@_eSZS|JrOI}_+!DC87k+awvt5y6 zjz$c~&6bT71P3M*CF*zQ$cCvzkHh>=L9k(>Si+6X6W3?4&XFXMjb$dNA^ef(?~VEH z4L6P%XT0^NpJMw(7**a;;N4&tMTl#S0xAq*l*i|f5LRR5Nc(U4@C+EI05Df%Tw>1P zF;ybsv_j znQCyOVrfuBZ(Wn&+Pf-XB})$L+w{97IgWVUYXDD z2RzK_pQ8X0*(!6R138TztXlz#vf(jNE6Zcugd5wR)N?;+)`fQ7p)u8&YBQOtEXnLWZPX{(^TSYLs(&Pyp*K=R7U=1Jk zLiau(-Rs(4bT6**WLod6gR1G>e2KVu++}D-KYfB$EeHZvD(jr4hUMJPJKOLW)t;u1 zkuBTyD4f`Ll^pEN7Sm3WbVxu~Wpy_n!E1g_DYNWVb%b-(%Q?L*AS#SZXV`PHrQlOG zK8#}in&VqDkiJ$zto%Z88)S}2lPj6m^+wCe=X+m| zLm4VmFk)hJCUG7gbvuWiPSS&&35Dk9PWdJ|U6c05v*+SAS@EcS{m!AFyLWp5O=lCH zAJkJaI1Q^EgCKC3`v@V?a$I4a;)%}X{2kvKs;5f7M9x&tN)rK{H<5=_O*AuNgKOWg zDKKX$a`f&NLWL>+#?fuJdGB_IA?QHJldB{hiKEU=Qr11jDY&Fd&{NRWtvzTs9#1{H zr#ct3erGcX6sZpFi)nx=WpAj+?e?)zO1oKHU}8?=F;z*HCz1)jD;DCZvKgoWO*%qS zqpC|!B$EnHYR|fw_|6=$sq%M07rD*m_x9V+wcxpP_dvXqP)ZNd3T%)@_-<|mCbPH| zti7&rl62?VOMEbXoG~y#Wy;6ISHcu5$6Fa#bNpPjFIyY5yO%n5E#g>GQs?ip`HkU|!UHa^!UD?XV=hi)QyK+Lgn6qe-_ zbx^My$eji(^C+}K;>p3jJH>zShMfbCIn3|c3A`Fzq9(*s;&hr&&t~p*E{&Ss>h&S@0^2Vk!bm2SPg@#7{DW1 zy*$DpiC<#zDZAeQZ<>NQNf*Rx>OIbirc@CfYFVQ3ryg%u}C=bAiiYiRQk^cabx zT%@G;mtgU`rO#pl9&GVJwZz`Tkhu2+*1h{kRrk>G4{abOwplhpxhGE4yN0~;Own7C z!OyLZ7YAFwU`k4mL@e~u*MS$1EUo7oREl0ms`ibh{Ny$!p)XBm)e-XU@>+}ocf5)Y z$P-dSz_YE!F>|dx<*7BrCgw_xH_pR^;Hug5Dgs33z;Kw@D)QLF~fXZk;uGBMc@|K)M^6k%bbCx*zemrpC9x0{(ik}?> zDZVOT=7Y7{_mOWt`0neK8eA*6UN=kcl%N3d0{bS*%2NqE3) zbKmkZhPE(okZ%GnOP;Oz{+&G1_!r*KD90^vy1sVXD$bJ{Ja(V776T6R#0h(|tAyC^ zR}7Nu?$_e`E`7Q6jAft>8%8!v9I7pl0*WIK#8m5BwjTXr4Zre{K)Tb+~Cj$E8vBRr_iWkX(+^m0quby*-|5V&jc>Z(*>N~)}Fs#j>iR6MQ-Q3 z%289$edcjS{n|5P{VOu%q|<2C8=QcFSwOSem8Muj_)c;sS?8cj22!kU4YAmr>AY4y zVO?lGfv?uy)J)2%MtNx3B(PT~6JMkSf#)Qfu#)Z!*J#yETopn@LpnwAdaOBQY{&fS(+T14HRIN=rk zjnLaGfyRjy!_lsR6{~$~iF@k@Niz$&1os655yvJ|jRfkD4E6O}*G^(P-C`9a3;1b^ zsXfkSC(bN5oM2-Zs`!3;#iTb_Y+uAyOBcfpgB>4|e|TTZ^`}@9h2L7<0(Z9Uav3cI zflNRjVLUxYOFWVF7OS@`{iY*-;L)yT?zCCCHlsstBih7>G5FYNpSV}swAo2>xN$d5 zk)7lJxpUy<a&bLov}+h%43XqdVUbC?SkEBy?X3h0oymP5;(OYLh zLbYr`LQ}E&ZExLH3fsYJ3EGPlr<`BPfs zbLM7Bc!N|18(w2syQhQ)q)y~k7JokiiL-V~?9xlMe>4|cNmiD>xr&LE-QXgsr8F5Po< z{*$s3@gQlf2S*0MIpk0MBdwW!r!UNE@^ISvxZNjQN{PjOhH`njvt+D zEe!UD7iv6^=JsQ3JnHV0m%`2qyj!Kz=x(%tHJt-}@6>3zcJQ)v5Y8kkJ>zAsiQ1_U)pn(g1ynBE=vecx2skgAzU9Q+p5MhR`1; zp=pB8rTaj((1#V;Fz_)s2AK!=yuSj*03ZKz& Date: Thu, 1 Sep 2022 13:35:58 -0400 Subject: [PATCH 02/17] fix format --- README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/README.md b/README.md index 665710c..1f54da2 100755 --- a/README.md +++ b/README.md @@ -1,8 +1,5 @@ ![logo](images/logo.png) --- - -# Sei framework Welcome to the Sei framework repository! Sei is a framework for systematically predicting sequence regulatory activities and applying sequence information to human genetics data. Sei provides a global map from any sequence to regulatory activities, as represented by 40 sequence classes, and each sequence class integrates predictions for 21,907 chromatin profiles (transcription factor, histone marks, and chromatin accessibility profiles across a wide range of cell types). From ede05a5378e3e625ec81cde64d4d327488eefe8b Mon Sep 17 00:00:00 2001 From: Kathy Date: Thu, 1 Sep 2022 13:45:05 -0400 Subject: [PATCH 03/17] resize logo --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f54da2..9d7afc5 100755 --- a/README.md +++ b/README.md @@ -1,4 +1,7 @@ -![logo](images/logo.png) +

+ +

+ Welcome to the Sei framework repository! Sei is a framework for systematically predicting sequence regulatory activities and applying sequence information to human genetics data. Sei provides a global map from any sequence to regulatory activities, as represented by 40 sequence classes, and each sequence class integrates predictions for 21,907 chromatin profiles (transcription factor, histone marks, and chromatin accessibility profiles across a wide range of cell types). From 6bfd482d085899c20c3130bee492f04c48e42cce Mon Sep 17 00:00:00 2001 From: Kathy Date: Mon, 26 Sep 2022 13:11:53 -0400 Subject: [PATCH 04/17] start adding new CLI scripts for different functionalities --- TODOs | 4 + run_pipeline.sh | 4 +- seq_prediction_cli.py | 73 +++++++++++++++++ sequence_class.from_seq_prediction.py | 103 +++++++++++++++++++++++ sequence_class.from_vep.py | 83 +++++++++++++++++++ sequence_class.py => utils.py | 112 ++++++++------------------ 6 files changed, 300 insertions(+), 79 deletions(-) create mode 100644 TODOs create mode 100755 seq_prediction_cli.py create mode 100755 sequence_class.from_seq_prediction.py create mode 100755 sequence_class.from_vep.py rename sequence_class.py => utils.py (62%) diff --git a/TODOs b/TODOs new file mode 100644 index 0000000..0d2861c --- /dev/null +++ b/TODOs @@ -0,0 +1,4 @@ +- `sequence_class.py` no longer exists, need to update `run_pipeline.sh` +- need new sequence prediction config +- `sequence_class.from_seq_prediction.py` is not done +- nothing is tested yet diff --git a/run_pipeline.sh b/run_pipeline.sh index 128e8d5..8e66dfa 100755 --- a/run_pipeline.sh +++ b/run_pipeline.sh @@ -29,13 +29,13 @@ cp $vcf_filepath $outdir/ if [ "$cuda" = "--cuda" ] then echo "use_cuda: True" - python vep_cli.py "$outdir/$vcf_basename" \ + python -u vep_cli.py "$outdir/$vcf_basename" \ $outdir \ --genome=${hg_version} \ --cuda else echo "use_cuda: False" - python vep_cli.py "$outdir/$vcf_basename" \ + python -u vep_cli.py "$outdir/$vcf_basename" \ $outdir \ --genome=${hg_version} fi diff --git a/seq_prediction_cli.py b/seq_prediction_cli.py new file mode 100755 index 0000000..d8e8cf7 --- /dev/null +++ b/seq_prediction_cli.py @@ -0,0 +1,73 @@ +""" +Description: + CLI for sequence prediction using the Sei deep learning model. + Outputs Sei chromatin profile predictions. + +Usage: + seq_prediction_cli.py [--genome=] [--cuda] + seq_prediction_cli.py -h | --help + +Options: + -h --help Show this screen. + Input FASTA or BED file. + Output directory + --genome= If is a BED file, specify the reference + genome hg38 or hg19 [default: hg19] + --cuda Run variant effect prediction on a CUDA-enabled + GPU + +""" +import os + +from docopt import docopt + +from selene_sdk.sequences import Genome +from selene_sdk.utils import load_path +from selene_sdk.utils import parse_configs_and_run + + +def _finditem(obj, val): + for k, v in obj.items(): + if hasattr(v, 'keywords'): + _finditem(v.keywords, val) + elif isinstance(v, dict): + _finditem(v, val) + elif isinstance(v, str) and '' in v: + obj[k] = v.replace('', val) + + +if __name__ == "__main__": + arguments = docopt( + __doc__, + version='0.0.0') + + seq_input = arguments[""] + + os.makedirs(arguments[""], exist_ok=True) + + # Assumes that the `models` directory is in the same directory as this + # script. Please update this line if not. + use_dir = os.path.dirname(os.path.abspath(__file__)) + use_cuda = arguments["--cuda"] + + sei_out = os.path.join(arguments[""], "chromatin-profiles-hdf5") + os.makedirs(sei_out, exist_ok=True) + + configs = load_path("./model/sei_seq_prediction.yml", instantiate=False) + _finditem(configs, use_dir) + + configs["prediction"].bind(input=seq_input, output_dir=sei_out) + configs["analyze_sequences"].bind(use_cuda=use_cuda) + + if not seq_input.endswith('.fa') and not seq_input.endswith('.fasta'): + hg_version = arguments["--genome"] + genome = None + if hg_version == 'hg38' or hg_version == 'hg19': + genome = Genome( + os.path.join('.', 'resources', '{0}_UCSC.fa'.format(hg_version))) + configs["analyze_sequences"].bind(reference_sequence=genome) + else: + raise ValueError("--genome= must be 'hg19' or 'hg38'") + + parse_configs_and_run(configs) + diff --git a/sequence_class.from_seq_prediction.py b/sequence_class.from_seq_prediction.py new file mode 100755 index 0000000..184819f --- /dev/null +++ b/sequence_class.from_seq_prediction.py @@ -0,0 +1,103 @@ +""" +Description: + This script is run after `seq_prediction_cli.py`. It computes the + sequence class-level scores based on Sei chromatin profile predictions, + and by default, sorts the variants based on the maximum absolute scores + across sequence classes and outputs the results as TSV files. + +Usage: + sequence_class.from_seq_prediction.py + [--input=] + [--ref=] [--alt=] + [--no-tsv] + sequence_class.from_seq_prediction.py -h | --help + +Options: + The results directory. + --input= Path to sequence predictions to compute sequence + class projection scores (NO variant effect). + Use `--input` ONLY or `--ref` and `--alt` ONLY. + If `--input` is used `--ref` and `--alt` will be ignored. + --ref= Path to reference sequence predictions to compute + sequence class variant effect scores. + `--alt` must also be used. + --alt= Path to the alternate sequence predictions to + compute sequence class variant effect scores. + `--ref` must also be used. + --no-tsv The TSVs outputted sort the variants based on maximum + absolute scores across sequence classes and are intended + for easier perusal of the predictions for those more + familiar with this file type, but makes this script take + much longer to complete as a result. If you are comfortable + working with HDF5 and NPY files, you can suppress the TSV + output and use the files in `chromatin-profiles-hdf5`. + +""" +import os + +from docopt import docopt +import h5py +import numpy as np +import pandas as pd + +from utils import get_targets, get_data +from utils import read_rowlabels_file +from utils import sc_hnorm_varianteffect +from utils import write_to_tsv + + +if __name__ == "__main__": + arguments = docopt( + __doc__, + version='1.0.0') + results_dir = arguments[''] + input_pred = arguments['--input'] + ref_pred = arguments['--ref'] + alt_pred = arguments['--alt'] + no_tsv = arguments['--no-tsv'] + + if input_pred is None and ref_pred is None: + + print(arguments) + import sys + sys.exit(0) + + _, filename = os.path.split(input_vcf) + filename_prefix = '.'.join(filename.split('.')[:-1]) + + sei_dir = "./model" + chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) + seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) + + profile_pred_dir = os.path.join(results_dir, 'chromatin-profiles-hdf5') + rowlabels_filename = "{0}_row_labels.txt".format(filename_prefix) + chromatin_profile_rowlabels = os.path.join(profile_pred_dir, rowlabels_filename) + + chromatin_profile_ref = get_data(os.path.join( + profile_pred_dir, + "{0}.ref_predictions.h5".format(filename_prefix))) + chromatin_profile_alt = get_data(os.path.join( + profile_pred_dir, + "{0}.alt_predictions.h5".format(filename_prefix))) + + clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) + histone_inds = np.load(os.path.join(sei_dir, 'histone_inds.npy')) + + diffproj = sc_hnorm_varianteffect( + chromatin_profile_ref, + chromatin_profile_alt, + clustervfeat, + histone_inds) + max_abs_diff = np.abs(diffproj).max(axis=1) + + np.save(os.path.join(profile_pred_dir, "sequence_class_scores.npy"), diffproj) + if not no_tsv: + write_to_tsv(max_abs_diff, # max sequence class score + chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs + diffproj, # sequence class diffs + chromatin_profiles, # chromatin profile targets + seqclass_names, # sequence class names + read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), + os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), + os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) + diff --git a/sequence_class.from_vep.py b/sequence_class.from_vep.py new file mode 100755 index 0000000..664ed2f --- /dev/null +++ b/sequence_class.from_vep.py @@ -0,0 +1,83 @@ +""" +Description: + This script is run after `vep_cli.py`. It computes the sequence class-level + variant effect scores based on Sei chromatin profile predictions, + and by default, sorts the variants based on the maximum absolute scores + across sequence classes and outputs the results as TSV files. + +Usage: + sequence_class.from_vep.py [--no-tsv] + sequence_class.from_vep.py -h | --help + +Options: + The results directory. + Name of VCF file. + --no-tsv The TSVs outputted sort the variants based on maximum + absolute scores across sequence classes and are intended + for easier perusal of the predictions for those more + familiar with this file type, but makes this script take + much longer to complete as a result. If you are comfortable + working with HDF5 and NPY files, you can suppress the TSV + output and use the files in `chromatin-profiles-hdf5`. + +""" +import os + +from docopt import docopt +import h5py +import numpy as np +import pandas as pd + +from utils import get_targets, get_data +from utils import read_rowlabels_file +from utils import sc_hnorm_varianteffect +from utils import write_to_tsv + + +if __name__ == "__main__": + arguments = docopt( + __doc__, + version='1.0.0') + results_dir = arguments[''] + input_vcf = arguments[''] + no_tsv = arguments['--no-tsv'] + + _, filename = os.path.split(input_vcf) + filename_prefix = '.'.join(filename.split('.')[:-1]) + + sei_dir = "./model" + chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) + seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) + + profile_pred_dir = os.path.join(results_dir, 'chromatin-profiles-hdf5') + rowlabels_filename = "{0}_row_labels.txt".format(filename_prefix) + chromatin_profile_rowlabels = os.path.join(profile_pred_dir, rowlabels_filename) + + chromatin_profile_ref = get_data(os.path.join( + profile_pred_dir, + "{0}.ref_predictions.h5".format(filename_prefix))) + chromatin_profile_alt = get_data(os.path.join( + profile_pred_dir, + "{0}.alt_predictions.h5".format(filename_prefix))) + + clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) + histone_inds = np.load(os.path.join(sei_dir, 'histone_inds.npy')) + + diffproj = sc_hnorm_varianteffect( + chromatin_profile_ref, + chromatin_profile_alt, + clustervfeat, + histone_inds) + max_abs_diff = np.abs(diffproj).max(axis=1) + + np.save(os.path.join(profile_pred_dir, "sequence_class_scores.npy"), diffproj) + if not no_tsv: + write_to_tsv(max_abs_diff, # max sequence class score + chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs + diffproj, # sequence class diffs + chromatin_profiles, # chromatin profile targets + seqclass_names, # sequence class names + read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), + os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), + os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) + diff --git a/sequence_class.py b/utils.py similarity index 62% rename from sequence_class.py rename to utils.py index af65891..63d835a 100755 --- a/sequence_class.py +++ b/utils.py @@ -1,28 +1,14 @@ -""" -Description: - This script is run after `vep_cli.py`. It computes the sequence-class - level variant effect scores based on Sei chromatin profile predictions, - sorts the variants based on the maximum absolute scores across sequence classes - and outputs the results as TSV files. - -Usage: - sequence_class.py - sequence_class.py -h | --help - -Options: - The results directory. - Name of VCF file. - -""" import os -from docopt import docopt import h5py import numpy as np import pandas as pd def get_targets(filename): + """ + Load the names of all the chromatin profiles predicted by Sei + """ targets = [] with open(filename, 'r') as file_handle: for line in file_handle: @@ -31,6 +17,9 @@ def get_targets(filename): def get_data(filename): + """ + Load HDF5 file of predictions into memory + """ fh = h5py.File(filename, 'r') data = fh["data"][()] fh.close() @@ -38,6 +27,9 @@ def get_data(filename): def read_rowlabels_file(rowlabels, use_strand=False): + """ + Read separate row labels file for variant effect prediction. + """ labels = [] with open(rowlabels, 'r') as file_handle: for line in file_handle: @@ -53,6 +45,32 @@ def read_rowlabels_file(rowlabels, use_strand=False): return labels +def sc_projection(chromatin_profile_preds, clustervfeat): + return (np.dot(chromatin_profile_pred, clustervfeat.T) / + np.linalg.norm(clustervfeat, axis=1)) + + +def sc_hnorm_varianteffect(chromatin_profile_ref, chromatin_profile_alt, clustervfeat, histone_inds): + chromatin_profile_ref_adjust = chromatin_profile_ref.copy() + chromatin_profile_ref_adjust[:, histone_inds] = \ + chromatin_profile_ref_adjust[:, histone_inds] * ( + (np.sum(chromatin_profile_ref[:, histone_inds], axis=1)*0.5 + + np.sum(chromatin_profile_alt[:, histone_inds], axis=1)*0.5) / + np.sum(chromatin_profile_ref[:, histone_inds], axis=1))[:, None] + + chromatin_profile_alt_adjust = chromatin_profile_alt.copy() + chromatin_profile_alt_adjust[:, histone_inds] = \ + chromatin_profile_alt_adjust[:, histone_inds] * ( + (np.sum(chromatin_profile_ref[:, histone_inds], axis=1)*0.5 + + np.sum(chromatin_profile_alt[:, histone_inds], axis=1)*0.5) / + np.sum(chromatin_profile_alt[:, histone_inds], axis=1))[:, None] + + refproj = sc_projection(chromatin_profile_ref_adjust, clustervfeat) + altproj = sc_projection(chromatin_profile_alt_adjust, clustervfeat) + diffproj = altproj[:,:40] - refproj[:,:40] + return diffproj + + def write_to_tsv(max_abs_diff, chromatin_profile_diffs, sequence_class_projscores, @@ -111,63 +129,3 @@ def write_to_tsv(max_abs_diff, -if __name__ == "__main__": - arguments = docopt( - __doc__, - version='1.0.0') - results_dir = arguments[''] - input_vcf = arguments[''] - - _, filename = os.path.split(input_vcf) - filename_prefix = '.'.join(filename.split('.')[:-1]) - - sei_dir = "./model" - chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) - seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) - - profile_pred_dir = os.path.join(results_dir, 'chromatin-profiles-hdf5') - rowlabels_filename = "{0}_row_labels.txt".format(filename_prefix) - chromatin_profile_rowlabels = os.path.join(profile_pred_dir, rowlabels_filename) - - chromatin_profile_ref = get_data(os.path.join( - profile_pred_dir, - "{0}.ref_predictions.h5".format(filename_prefix))) - chromatin_profile_alt = get_data(os.path.join( - profile_pred_dir, - "{0}.alt_predictions.h5".format(filename_prefix))) - - clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) - histone_inds = np.load(os.path.join(sei_dir, 'histone_inds.npy')) - - chromatin_profile_ref_adjust = chromatin_profile_ref.copy() - chromatin_profile_ref_adjust[:, histone_inds] = \ - chromatin_profile_ref_adjust[:, histone_inds] * ( - (np.sum(chromatin_profile_ref[:, histone_inds], axis=1)*0.5 + - np.sum(chromatin_profile_alt[:, histone_inds], axis=1)*0.5) / - np.sum(chromatin_profile_ref[:, histone_inds], axis=1))[:, None] - - chromatin_profile_alt_adjust = chromatin_profile_alt.copy() - chromatin_profile_alt_adjust[:, histone_inds] = \ - chromatin_profile_alt_adjust[:, histone_inds] * ( - (np.sum(chromatin_profile_ref[:, histone_inds], axis=1)*0.5 + - np.sum(chromatin_profile_alt[:, histone_inds], axis=1)*0.5) / - np.sum(chromatin_profile_alt[:, histone_inds], axis=1))[:, None] - - refproj = (np.dot(chromatin_profile_ref_adjust,clustervfeat.T) / - np.linalg.norm(clustervfeat,axis=1)) - altproj = (np.dot(chromatin_profile_alt_adjust,clustervfeat.T) / - np.linalg.norm(clustervfeat,axis=1)) - diffproj = altproj[:,:40] - refproj[:,:40] - - max_abs_diff = np.abs(diffproj).max(axis=1) - - write_to_tsv(max_abs_diff, # max sequence class score - chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs - diffproj, # sequence class diffs - chromatin_profiles, # chromatin profile targets - seqclass_names, # sequence class names - read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), - os.path.join(results_dir, "chromatin_profile_diffs.tsv"), - os.path.join(results_dir, "sequence_class_scores.tsv")) - - From 9c88dfdac61d207ab124235a3511f8ab82e0ac52 Mon Sep 17 00:00:00 2001 From: Kathy Date: Tue, 27 Sep 2022 17:03:23 -0400 Subject: [PATCH 05/17] vep file restructuring --- sequence_class.from_vep.py | 95 ++++++++++++++++++++++++++------------ utils.py | 19 -------- 2 files changed, 66 insertions(+), 48 deletions(-) diff --git a/sequence_class.from_vep.py b/sequence_class.from_vep.py index 664ed2f..6deabc0 100755 --- a/sequence_class.from_vep.py +++ b/sequence_class.from_vep.py @@ -6,19 +6,30 @@ across sequence classes and outputs the results as TSV files. Usage: - sequence_class.from_vep.py [--no-tsv] + sequence_class.from_vep.py + [--out-name=] + [--no-tsv] sequence_class.from_vep.py -h | --help Options: - The results directory. - Name of VCF file. - --no-tsv The TSVs outputted sort the variants based on maximum - absolute scores across sequence classes and are intended - for easier perusal of the predictions for those more - familiar with this file type, but makes this script take - much longer to complete as a result. If you are comfortable - working with HDF5 and NPY files, you can suppress the TSV - output and use the files in `chromatin-profiles-hdf5`. + Reference sequence Sei predictions file. Assumes + the row labels file is in the same directory. The resulting + variant effect sequence class scores will be computed as + alt - ref sequence class scores (adjusted for nucleosome + occupancy). + Alternate sequence Sei predictions file. + The directory to output the sequence class scores + & TSVs (if `--no-tsv` is not used). + --out-name= Specify an output filename prefix that all outputted + files will use. Otherwise, filenames will be based on + . + --no-tsv The TSVs outputted sort the variants based on maximum + absolute scores across sequence classes and are intended + for easier perusal of the predictions for those more + familiar with this file type, but makes this script take + much longer to complete as a result. If you are comfortable + working with HDF5 and NPY files, you can suppress the TSV + output and use the files in `chromatin-profiles-hdf5`. """ import os @@ -38,28 +49,53 @@ arguments = docopt( __doc__, version='1.0.0') - results_dir = arguments[''] - input_vcf = arguments[''] - no_tsv = arguments['--no-tsv'] + ref_pred_file = arguments[''] + alt_pred_file = arguments[''] + + # load predictions + chromatin_profile_ref = get_data(ref_pred_file) + chromatin_profile_alt = get_data(alt_pred_file) + if len(chromatin_profile_ref) != len(chromatin_profile_alt): + raise ValueError(("{0} and {1} have different number of rows: {2} vs {3}, " + "respectively.").format(ref_pred_file, alt_pred_file, + len(chromatin_profile_ref), + len(chromatin_profile_alt))) + + ref_dir, _ = os.path.split(ref_pred_file) + alt_dir, _ = os.path.split(alt_pred_file) + + # checks if the ref/alt are from variant effect prediction (VCF) + # or sequence prediction (BED or FASTA file inputs) + seq_from = None + alt_prefix = None + if '.ref_predictions' in ref_pred_file and '.alt_predictions' in alt_pred_file: + seq_from = 'VCF' + alt_prefix = os.path.basename(alt_pred_file).split('.alt_predictions')[0] + elif '.alt_predictions' in ref_pred_file and 'ref_predictions' in ref_pred_file: + seq_from = 'VCF' + alt_prefix = os.path.basename(alt_pred_file).split('.ref_predictions')[0] + elif '_predictions' in ref_pred_file and '_predictions' in alt_pred_file: + seq_from = 'BED/FASTA' + alt_prefix = os.path.basename(alt_pred_file).split('_predictions')[0] + rowlabels_file = os.path.join(alt_dir, '{0}_row_labels.txt'.format(alt_prefix)) + rowlabels = pd.read_csv(rowlabels, sep='\t') + if len(rowlabels) != len(chromatin_profile_alt): + raise ValueError(("Rowlabels file '{0}' does not have the same number " + "of rows as '{1}'").format(rowlabels_file, alt_pred_file)) + + output_dir = arguments[''] - _, filename = os.path.split(input_vcf) - filename_prefix = '.'.join(filename.split('.')[:-1]) + output_prefix = arguments['--out-name'] + if output_prefix is None: + output_prefix = alt_prefix + print("Output files will start with {0}".format(output_prefix)) + + no_tsv = arguments['--no-tsv'] sei_dir = "./model" chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) - profile_pred_dir = os.path.join(results_dir, 'chromatin-profiles-hdf5') - rowlabels_filename = "{0}_row_labels.txt".format(filename_prefix) - chromatin_profile_rowlabels = os.path.join(profile_pred_dir, rowlabels_filename) - - chromatin_profile_ref = get_data(os.path.join( - profile_pred_dir, - "{0}.ref_predictions.h5".format(filename_prefix))) - chromatin_profile_alt = get_data(os.path.join( - profile_pred_dir, - "{0}.alt_predictions.h5".format(filename_prefix))) - clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) histone_inds = np.load(os.path.join(sei_dir, 'histone_inds.npy')) @@ -70,7 +106,8 @@ histone_inds) max_abs_diff = np.abs(diffproj).max(axis=1) - np.save(os.path.join(profile_pred_dir, "sequence_class_scores.npy"), diffproj) + np.save(os.path.join(output_dir, "sequence_class_scores.npy"), diffproj) + if not no_tsv: write_to_tsv(max_abs_diff, # max sequence class score chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs @@ -78,6 +115,6 @@ chromatin_profiles, # chromatin profile targets seqclass_names, # sequence class names read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), - os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), - os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) + os.path.join(output_dir, "sorted.chromatin_profile_diffs.tsv"), + os.path.join(output_dir, "sorted.sequence_class_scores.tsv")) diff --git a/utils.py b/utils.py index 63d835a..ec5c01b 100755 --- a/utils.py +++ b/utils.py @@ -26,25 +26,6 @@ def get_data(filename): return data -def read_rowlabels_file(rowlabels, use_strand=False): - """ - Read separate row labels file for variant effect prediction. - """ - labels = [] - with open(rowlabels, 'r') as file_handle: - for line in file_handle: - if 'contains_unk' in line: - continue - cols = line.strip().split('\t') - chrom, pos, id, ref, alt, strand, ref_match, contains_unk = cols - if not use_strand: - strand = '.' - labels.append( - (chrom, pos, id, ref, alt, strand, - ref_match, contains_unk)) - return labels - - def sc_projection(chromatin_profile_preds, clustervfeat): return (np.dot(chromatin_profile_pred, clustervfeat.T) / np.linalg.norm(clustervfeat, axis=1)) From c142a5b7b3429e6b9f5d85886a7913390e6a9887 Mon Sep 17 00:00:00 2001 From: Kathy Date: Tue, 27 Sep 2022 17:37:11 -0400 Subject: [PATCH 06/17] update write to tsv function --- utils.py | 56 +++++++++++++++++++++----------------------------------- 1 file changed, 21 insertions(+), 35 deletions(-) diff --git a/utils.py b/utils.py index ec5c01b..4f25b50 100755 --- a/utils.py +++ b/utils.py @@ -61,51 +61,37 @@ def write_to_tsv(max_abs_diff, output_chromatin_profile_file, output_sequence_class_file): sorted_sc_abs_diff = np.sort(max_abs_diff)[::-1] + sorted_maxsc_df = pd.DataFrame(sorted_sc_absdiff, # dataframe + columns=['seqclass_max_absdiff']) + sorted_ixs = np.argsort(max_abs_diff)[::-1] assert len(sorted_ixs) == chromatin_profile_diffs.shape[0] - rowlabels = np.array(rowlabels)[sorted_ixs] + sorted_rowlabels = rowlabels.iloc[sorted_ixs] # dataframe + rowlabel_columns = sorted_rowlabels.columns.tolist() + + # sorted now chromatin_profile_diffs = chromatin_profile_diffs[sorted_ixs, :] sequence_class_projscores = sequence_class_projscores[sorted_ixs,:] - chromatin_profile_rows = [] - sequence_class_rows = [] - for ix, r in enumerate(rowlabels): - ref_match = r[-2] - contains_unk = r[-1] - label = r[:-2] - chromatin_profile_row = np.hstack( - [[sorted_sc_abs_diff[ix], ref_match, contains_unk], - label, - chromatin_profile_diffs[ix, :]]).tolist() - sequence_class_row = np.hstack( - [[sorted_sc_abs_diff[ix], ref_match, contains_unk], - label, - sequence_class_projscores[ix, :]]).tolist() - chromatin_profile_rows.append(chromatin_profile_row) - sequence_class_rows.append(sequence_class_row) + # dataframes + sorted_profiles_df = pd.DataFrame(chromatin_profile_diffs, columns=chromatin_profiles) + sorted_sc_df = pd.DataFrame(sequence_class_projscores, columns=seqclass_names) del chromatin_profile_diffs - colnames = [ - 'seqclass_max_absdiff', 'ref_match', 'contains_unk', - 'chrom', 'pos', 'id', 'ref', 'alt', 'strand'] - - sc_df = pd.DataFrame( - sequence_class_rows, columns=colnames + seqclass_names) - check_columns = sc_df.columns.tolist() - check_columns.remove('strand') - sc_df.drop_duplicates(subset=check_columns, inplace=True) - sc_df.to_csv(output_sequence_class_file, sep='\t', index=False) - - cp_df = pd.DataFrame( - chromatin_profile_rows, columns=colnames + chromatin_profiles) - check_columns = cp_df.columns.tolist() - check_columns.remove('strand') - cp_df.drop_duplicates(subset=check_columns, inplace=True) + + sei_df = pd.concat([sorted_maxsc_df, sorted_rowlabels, sorted_profiles_df], + axis=1) + sc_df = pd.concat([sorted_maxsc_df, sorted_rowlabels, sorted_sc_df], + axis=1) + + sc_df[['seqclass_max_absdiff'] + rowlabel_columns + seqclass_names].to_csv( + output_sequence_class_file, sep='\t', index=False) + if len(cp_df) > 10000: - cp_df.to_csv( + cp_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( output_chromatin_profile_file, sep='\t', index=False, compression='gzip') else: - cp_df.to_csv( + cp_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( output_chromatin_profile_file, sep='\t', index=False) From 2bd9de88c81017c3b57b1351adf1cf9ac24ce1ee Mon Sep 17 00:00:00 2001 From: Kathy Date: Wed, 28 Sep 2022 12:23:27 -0400 Subject: [PATCH 07/17] testing VEP script --- sequence_class.from_vep.py | 17 +++++++++++------ utils.py | 13 ++++++------- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/sequence_class.from_vep.py b/sequence_class.from_vep.py index 6deabc0..3345666 100755 --- a/sequence_class.from_vep.py +++ b/sequence_class.from_vep.py @@ -40,7 +40,6 @@ import pandas as pd from utils import get_targets, get_data -from utils import read_rowlabels_file from utils import sc_hnorm_varianteffect from utils import write_to_tsv @@ -78,12 +77,13 @@ seq_from = 'BED/FASTA' alt_prefix = os.path.basename(alt_pred_file).split('_predictions')[0] rowlabels_file = os.path.join(alt_dir, '{0}_row_labels.txt'.format(alt_prefix)) - rowlabels = pd.read_csv(rowlabels, sep='\t') + rowlabels = pd.read_csv(rowlabels_file, sep='\t') if len(rowlabels) != len(chromatin_profile_alt): raise ValueError(("Rowlabels file '{0}' does not have the same number " "of rows as '{1}'").format(rowlabels_file, alt_pred_file)) output_dir = arguments[''] + os.makedirs(output_dir, exist_ok=True) output_prefix = arguments['--out-name'] if output_prefix is None: @@ -106,7 +106,8 @@ histone_inds) max_abs_diff = np.abs(diffproj).max(axis=1) - np.save(os.path.join(output_dir, "sequence_class_scores.npy"), diffproj) + np.save(os.path.join( + output_dir, "{0}.sequence_class_scores.npy".format(output_prefix)), diffproj) if not no_tsv: write_to_tsv(max_abs_diff, # max sequence class score @@ -114,7 +115,11 @@ diffproj, # sequence class diffs chromatin_profiles, # chromatin profile targets seqclass_names, # sequence class names - read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), - os.path.join(output_dir, "sorted.chromatin_profile_diffs.tsv"), - os.path.join(output_dir, "sorted.sequence_class_scores.tsv")) + rowlabels, + os.path.join(output_dir, + "sorted.{0}.chromatin_profile_diffs.tsv".format( + output_prefix)), + os.path.join(output_dir, + "sorted.{0}.sequence_class_scores.tsv".format( + output_prefix))) diff --git a/utils.py b/utils.py index 4f25b50..727141e 100755 --- a/utils.py +++ b/utils.py @@ -27,7 +27,7 @@ def get_data(filename): def sc_projection(chromatin_profile_preds, clustervfeat): - return (np.dot(chromatin_profile_pred, clustervfeat.T) / + return (np.dot(chromatin_profile_preds, clustervfeat.T) / np.linalg.norm(clustervfeat, axis=1)) @@ -61,13 +61,13 @@ def write_to_tsv(max_abs_diff, output_chromatin_profile_file, output_sequence_class_file): sorted_sc_abs_diff = np.sort(max_abs_diff)[::-1] - sorted_maxsc_df = pd.DataFrame(sorted_sc_absdiff, # dataframe + sorted_maxsc_df = pd.DataFrame(sorted_sc_abs_diff, # dataframe columns=['seqclass_max_absdiff']) sorted_ixs = np.argsort(max_abs_diff)[::-1] - assert len(sorted_ixs) == chromatin_profile_diffs.shape[0] sorted_rowlabels = rowlabels.iloc[sorted_ixs] # dataframe + sorted_rowlabels.reset_index(inplace=True) rowlabel_columns = sorted_rowlabels.columns.tolist() # sorted now @@ -83,15 +83,14 @@ def write_to_tsv(max_abs_diff, axis=1) sc_df = pd.concat([sorted_maxsc_df, sorted_rowlabels, sorted_sc_df], axis=1) - sc_df[['seqclass_max_absdiff'] + rowlabel_columns + seqclass_names].to_csv( output_sequence_class_file, sep='\t', index=False) - if len(cp_df) > 10000: - cp_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( + if len(sei_df) > 10000: + sei_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( output_chromatin_profile_file, sep='\t', index=False, compression='gzip') else: - cp_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( + sei_df[['seqclass_max_absdiff'] + rowlabel_columns + chromatin_profiles].to_csv( output_chromatin_profile_file, sep='\t', index=False) From 359c9dbec9e72f2126d1b68ce7f23de24ddc002a Mon Sep 17 00:00:00 2001 From: Kathy Date: Wed, 28 Sep 2022 13:12:07 -0400 Subject: [PATCH 08/17] start working on sequence prediction --- get_raw_sc_score.py | 87 +++++++++++++++ ...m_vep.py => get_variant_effect_sc_score.py | 25 ++--- sequence_class.from_seq_prediction.py | 103 ------------------ utils.py | 13 +++ 4 files changed, 107 insertions(+), 121 deletions(-) create mode 100755 get_raw_sc_score.py rename sequence_class.from_vep.py => get_variant_effect_sc_score.py (83%) delete mode 100755 sequence_class.from_seq_prediction.py diff --git a/get_raw_sc_score.py b/get_raw_sc_score.py new file mode 100755 index 0000000..f4c41ae --- /dev/null +++ b/get_raw_sc_score.py @@ -0,0 +1,87 @@ +""" +Description: + This script is run after `seq_prediction_cli.py`. It computes the + sequence class-level scores based on Sei chromatin profile predictions, + and by default, sorts the variants based on the maximum absolute scores + across sequence classes and outputs the results as TSV files. + +Usage: + get_raw_sc_score.py + [--out-name=] + [--no-tsv] + get_raw_sc_score.py -h | --help + +Options: + Path to Sei sequence predictions to compute raw + sequence class projection scores (NO variant effect). + The directory to output the sequence class scores + & TSVs (if `--no-tsv` is not used). + --out-name= Specify an output filename prefix that all outputted + files will use. Otherwise, filenames will be based on + . + --no-tsv The TSVs outputted sort the variants based on maximum + absolute scores across sequence classes and are intended + for easier perusal of the predictions for those more + familiar with this file type, but makes this script take + much longer to complete as a result. If you are comfortable + working with HDF5 and NPY files, you can suppress the TSV + output and use the files in `chromatin-profiles-hdf5`. + --no-chromprof-tsv TODO +""" +import os + +from docopt import docopt +import h5py +import numpy as np +import pandas as pd + +from utils import get_filename_prefix, get_data, get_targets +from utils import sc_projection +from utils import write_to_tsv + + +if __name__ == "__main__": + arguments = docopt( + __doc__, + version='1.0.0') + input_pred_file = arguments[''] + input_preds = get_data(input_pred_file) + input_dir, input_fn = os.path.split(input_pred_file) + + input_prefix = get_filename_prefix(input_fn) + rowlabels_file = os.path.join(input_dir, '{0}_row_labels.txt'.format( + input_prefix)) + rowlabels = pd.read_csv(rowlabels_file, sep='\t') + if len(rowlabels) != len(input_preds): + raise ValueError(("Rowlabels file '{0}' does not have the same number " + "of rows as '{1}'").format(rowlabels_file, + input_pred_file)) + + output_prefix = arguments['--out-name'] + if output_prefix is None: + output_prefix = input_prefix + print("Output files will start with {0}".format(output_prefix)) + + no_tsv = arguments['--no-tsv'] + + sei_dir = "./model" + chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) + seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) + + clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) + + projscores = sc_projection(input_preds, clustervfeat) + np.save(os.path.join( + output, "{0}.raw_sequence_class_scores.npy".format(output_prefix)), + projscores) + + if not no_tsv: + write_to_tsv(max_abs_diff, # max sequence class score + chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs + diffproj, # sequence class diffs + chromatin_profiles, # chromatin profile targets + seqclass_names, # sequence class names + read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), + os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), + os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) + diff --git a/sequence_class.from_vep.py b/get_variant_effect_sc_score.py similarity index 83% rename from sequence_class.from_vep.py rename to get_variant_effect_sc_score.py index 3345666..9338bfa 100755 --- a/sequence_class.from_vep.py +++ b/get_variant_effect_sc_score.py @@ -6,10 +6,10 @@ across sequence classes and outputs the results as TSV files. Usage: - sequence_class.from_vep.py - [--out-name=] - [--no-tsv] - sequence_class.from_vep.py -h | --help + get_variant_effect_sc_score.py + [--out-name=] + [--no-tsv] + get_variant_effect_sc_score.py -h | --help Options: Reference sequence Sei predictions file. Assumes @@ -39,7 +39,7 @@ import numpy as np import pandas as pd -from utils import get_targets, get_data +from utils import get_filename_prefix, get_data, get_targets from utils import sc_hnorm_varianteffect from utils import write_to_tsv @@ -60,22 +60,11 @@ len(chromatin_profile_ref), len(chromatin_profile_alt))) - ref_dir, _ = os.path.split(ref_pred_file) - alt_dir, _ = os.path.split(alt_pred_file) + alt_dir, alt_fn = os.path.split(alt_pred_file) # checks if the ref/alt are from variant effect prediction (VCF) # or sequence prediction (BED or FASTA file inputs) - seq_from = None - alt_prefix = None - if '.ref_predictions' in ref_pred_file and '.alt_predictions' in alt_pred_file: - seq_from = 'VCF' - alt_prefix = os.path.basename(alt_pred_file).split('.alt_predictions')[0] - elif '.alt_predictions' in ref_pred_file and 'ref_predictions' in ref_pred_file: - seq_from = 'VCF' - alt_prefix = os.path.basename(alt_pred_file).split('.ref_predictions')[0] - elif '_predictions' in ref_pred_file and '_predictions' in alt_pred_file: - seq_from = 'BED/FASTA' - alt_prefix = os.path.basename(alt_pred_file).split('_predictions')[0] + alt_prefix = get_filename_prefix(alt_fn) rowlabels_file = os.path.join(alt_dir, '{0}_row_labels.txt'.format(alt_prefix)) rowlabels = pd.read_csv(rowlabels_file, sep='\t') if len(rowlabels) != len(chromatin_profile_alt): diff --git a/sequence_class.from_seq_prediction.py b/sequence_class.from_seq_prediction.py deleted file mode 100755 index 184819f..0000000 --- a/sequence_class.from_seq_prediction.py +++ /dev/null @@ -1,103 +0,0 @@ -""" -Description: - This script is run after `seq_prediction_cli.py`. It computes the - sequence class-level scores based on Sei chromatin profile predictions, - and by default, sorts the variants based on the maximum absolute scores - across sequence classes and outputs the results as TSV files. - -Usage: - sequence_class.from_seq_prediction.py - [--input=] - [--ref=] [--alt=] - [--no-tsv] - sequence_class.from_seq_prediction.py -h | --help - -Options: - The results directory. - --input= Path to sequence predictions to compute sequence - class projection scores (NO variant effect). - Use `--input` ONLY or `--ref` and `--alt` ONLY. - If `--input` is used `--ref` and `--alt` will be ignored. - --ref= Path to reference sequence predictions to compute - sequence class variant effect scores. - `--alt` must also be used. - --alt= Path to the alternate sequence predictions to - compute sequence class variant effect scores. - `--ref` must also be used. - --no-tsv The TSVs outputted sort the variants based on maximum - absolute scores across sequence classes and are intended - for easier perusal of the predictions for those more - familiar with this file type, but makes this script take - much longer to complete as a result. If you are comfortable - working with HDF5 and NPY files, you can suppress the TSV - output and use the files in `chromatin-profiles-hdf5`. - -""" -import os - -from docopt import docopt -import h5py -import numpy as np -import pandas as pd - -from utils import get_targets, get_data -from utils import read_rowlabels_file -from utils import sc_hnorm_varianteffect -from utils import write_to_tsv - - -if __name__ == "__main__": - arguments = docopt( - __doc__, - version='1.0.0') - results_dir = arguments[''] - input_pred = arguments['--input'] - ref_pred = arguments['--ref'] - alt_pred = arguments['--alt'] - no_tsv = arguments['--no-tsv'] - - if input_pred is None and ref_pred is None: - - print(arguments) - import sys - sys.exit(0) - - _, filename = os.path.split(input_vcf) - filename_prefix = '.'.join(filename.split('.')[:-1]) - - sei_dir = "./model" - chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) - seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) - - profile_pred_dir = os.path.join(results_dir, 'chromatin-profiles-hdf5') - rowlabels_filename = "{0}_row_labels.txt".format(filename_prefix) - chromatin_profile_rowlabels = os.path.join(profile_pred_dir, rowlabels_filename) - - chromatin_profile_ref = get_data(os.path.join( - profile_pred_dir, - "{0}.ref_predictions.h5".format(filename_prefix))) - chromatin_profile_alt = get_data(os.path.join( - profile_pred_dir, - "{0}.alt_predictions.h5".format(filename_prefix))) - - clustervfeat = np.load(os.path.join(sei_dir, 'projvec_targets.npy')) - histone_inds = np.load(os.path.join(sei_dir, 'histone_inds.npy')) - - diffproj = sc_hnorm_varianteffect( - chromatin_profile_ref, - chromatin_profile_alt, - clustervfeat, - histone_inds) - max_abs_diff = np.abs(diffproj).max(axis=1) - - np.save(os.path.join(profile_pred_dir, "sequence_class_scores.npy"), diffproj) - if not no_tsv: - write_to_tsv(max_abs_diff, # max sequence class score - chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs - diffproj, # sequence class diffs - chromatin_profiles, # chromatin profile targets - seqclass_names, # sequence class names - read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), - os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), - os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) - diff --git a/utils.py b/utils.py index 727141e..11fa3fd 100755 --- a/utils.py +++ b/utils.py @@ -52,6 +52,19 @@ def sc_hnorm_varianteffect(chromatin_profile_ref, chromatin_profile_alt, cluster return diffproj +def get_filename_prefix(filename): + """Filename must follow Selene output file conventions. + """ + prefix = None + if '.alt_predictions' in filename: + prefix = filename.split('.alt_predictions')[0] + elif '.ref_predictions' in filename: + prefix = filename.split('.ref_predictions')[0] + else: + prefix = filename.split('_predictions')[0] + return prefix + + def write_to_tsv(max_abs_diff, chromatin_profile_diffs, sequence_class_projscores, From c2977815f2e2eff6328aed6461e47edb6d69c51c Mon Sep 17 00:00:00 2001 From: Kathy Date: Mon, 3 Oct 2022 16:48:11 -0400 Subject: [PATCH 09/17] bugfixes to scripts --- get_raw_sc_score.py | 30 ++++++------------------------ seq_prediction_cli.py | 7 +++---- 2 files changed, 9 insertions(+), 28 deletions(-) diff --git a/get_raw_sc_score.py b/get_raw_sc_score.py index f4c41ae..6421616 100755 --- a/get_raw_sc_score.py +++ b/get_raw_sc_score.py @@ -8,7 +8,6 @@ Usage: get_raw_sc_score.py [--out-name=] - [--no-tsv] get_raw_sc_score.py -h | --help Options: @@ -19,14 +18,7 @@ --out-name= Specify an output filename prefix that all outputted files will use. Otherwise, filenames will be based on . - --no-tsv The TSVs outputted sort the variants based on maximum - absolute scores across sequence classes and are intended - for easier perusal of the predictions for those more - familiar with this file type, but makes this script take - much longer to complete as a result. If you are comfortable - working with HDF5 and NPY files, you can suppress the TSV - output and use the files in `chromatin-profiles-hdf5`. - --no-chromprof-tsv TODO + """ import os @@ -37,13 +29,15 @@ from utils import get_filename_prefix, get_data, get_targets from utils import sc_projection -from utils import write_to_tsv if __name__ == "__main__": arguments = docopt( __doc__, version='1.0.0') + output_dir = arguments[''] + os.makedirs(output_dir, exist_ok=True) + input_pred_file = arguments[''] input_preds = get_data(input_pred_file) input_dir, input_fn = os.path.split(input_pred_file) @@ -59,11 +53,9 @@ output_prefix = arguments['--out-name'] if output_prefix is None: - output_prefix = input_prefix + output_prefix = input_fn.split('_predictions')[0] print("Output files will start with {0}".format(output_prefix)) - no_tsv = arguments['--no-tsv'] - sei_dir = "./model" chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) seqclass_names = get_targets(os.path.join(sei_dir, "seqclass.names")) @@ -72,16 +64,6 @@ projscores = sc_projection(input_preds, clustervfeat) np.save(os.path.join( - output, "{0}.raw_sequence_class_scores.npy".format(output_prefix)), + output_dir, "{0}.raw_sequence_class_scores.npy".format(output_prefix)), projscores) - if not no_tsv: - write_to_tsv(max_abs_diff, # max sequence class score - chromatin_profile_alt - chromatin_profile_ref, # chromatin profile diffs - diffproj, # sequence class diffs - chromatin_profiles, # chromatin profile targets - seqclass_names, # sequence class names - read_rowlabels_file(chromatin_profile_rowlabels, use_strand=False), - os.path.join(results_dir, "sorted.chromatin_profile_diffs.tsv"), - os.path.join(results_dir, "sorted.sequence_class_scores.tsv")) - diff --git a/seq_prediction_cli.py b/seq_prediction_cli.py index d8e8cf7..a85c801 100755 --- a/seq_prediction_cli.py +++ b/seq_prediction_cli.py @@ -12,9 +12,9 @@ Input FASTA or BED file. Output directory --genome= If is a BED file, specify the reference - genome hg38 or hg19 [default: hg19] + genome hg38 or hg19 [default: hg19] --cuda Run variant effect prediction on a CUDA-enabled - GPU + GPU """ import os @@ -56,7 +56,7 @@ def _finditem(obj, val): configs = load_path("./model/sei_seq_prediction.yml", instantiate=False) _finditem(configs, use_dir) - configs["prediction"].bind(input=seq_input, output_dir=sei_out) + configs["prediction"].update(input_path=seq_input, output_dir=sei_out) configs["analyze_sequences"].bind(use_cuda=use_cuda) if not seq_input.endswith('.fa') and not seq_input.endswith('.fasta'): @@ -68,6 +68,5 @@ def _finditem(obj, val): configs["analyze_sequences"].bind(reference_sequence=genome) else: raise ValueError("--genome= must be 'hg19' or 'hg38'") - parse_configs_and_run(configs) From 0b8cb2b2541277629f4efafe4497d3fe613d7c8e Mon Sep 17 00:00:00 2001 From: Kathy Date: Tue, 4 Oct 2022 15:23:53 -0400 Subject: [PATCH 10/17] adjust names --- seq_prediction_cli.py => 1_sequence_prediction.py | 9 +++++---- vep_cli.py => 1_variant_effect_prediction.py | 4 ++-- get_raw_sc_score.py => 2_raw_sc_score.py | 8 ++++---- ...t_effect_sc_score.py => 2_varianteffect_sc_score.py | 10 +++++----- 4 files changed, 16 insertions(+), 15 deletions(-) rename seq_prediction_cli.py => 1_sequence_prediction.py (91%) rename vep_cli.py => 1_variant_effect_prediction.py (94%) rename get_raw_sc_score.py => 2_raw_sc_score.py (92%) rename get_variant_effect_sc_score.py => 2_varianteffect_sc_score.py (94%) diff --git a/seq_prediction_cli.py b/1_sequence_prediction.py similarity index 91% rename from seq_prediction_cli.py rename to 1_sequence_prediction.py index a85c801..3bba021 100755 --- a/seq_prediction_cli.py +++ b/1_sequence_prediction.py @@ -1,15 +1,16 @@ """ Description: - CLI for sequence prediction using the Sei deep learning model. + CLI for sequence prediction using the Sei deep learning model, + given an input FASTA or BED file. Outputs Sei chromatin profile predictions. Usage: - seq_prediction_cli.py [--genome=] [--cuda] - seq_prediction_cli.py -h | --help + 1_sequence_prediction.py [--genome=] [--cuda] + 1_sequence_prediction.py -h | --help Options: -h --help Show this screen. - Input FASTA or BED file. + Input FASTA or BED file. Output directory --genome= If is a BED file, specify the reference genome hg38 or hg19 [default: hg19] diff --git a/vep_cli.py b/1_variant_effect_prediction.py similarity index 94% rename from vep_cli.py rename to 1_variant_effect_prediction.py index dbfb269..2ce5fea 100755 --- a/vep_cli.py +++ b/1_variant_effect_prediction.py @@ -4,8 +4,8 @@ Outputs both Sei chromatin profile predictions and sequence class scores. Usage: - vep_cli.py [--genome=] [--cuda] - vep_cli.py -h | --help + 1_variant_effect_prediction.py [--genome=] [--cuda] + 1_variant_effect_prediction.py -h | --help Options: -h --help Show this screen. diff --git a/get_raw_sc_score.py b/2_raw_sc_score.py similarity index 92% rename from get_raw_sc_score.py rename to 2_raw_sc_score.py index 6421616..cc3fb89 100755 --- a/get_raw_sc_score.py +++ b/2_raw_sc_score.py @@ -6,9 +6,9 @@ across sequence classes and outputs the results as TSV files. Usage: - get_raw_sc_score.py - [--out-name=] - get_raw_sc_score.py -h | --help + 2_raw_sc_score.py + [--out-name=] + 2_raw_sc_score.py -h | --help Options: Path to Sei sequence predictions to compute raw @@ -54,7 +54,7 @@ output_prefix = arguments['--out-name'] if output_prefix is None: output_prefix = input_fn.split('_predictions')[0] - print("Output files will start with {0}".format(output_prefix)) + print("Output files will start with prefix '{0}'".format(output_prefix)) sei_dir = "./model" chromatin_profiles = get_targets(os.path.join(sei_dir, "target.names")) diff --git a/get_variant_effect_sc_score.py b/2_varianteffect_sc_score.py similarity index 94% rename from get_variant_effect_sc_score.py rename to 2_varianteffect_sc_score.py index 9338bfa..8b88db0 100755 --- a/get_variant_effect_sc_score.py +++ b/2_varianteffect_sc_score.py @@ -6,10 +6,10 @@ across sequence classes and outputs the results as TSV files. Usage: - get_variant_effect_sc_score.py - [--out-name=] - [--no-tsv] - get_variant_effect_sc_score.py -h | --help + 2_varianteffect_sc_score.py + [--out-name=] + [--no-tsv] + 2_varianteffect_sc_score.py -h | --help Options: Reference sequence Sei predictions file. Assumes @@ -77,7 +77,7 @@ output_prefix = arguments['--out-name'] if output_prefix is None: output_prefix = alt_prefix - print("Output files will start with {0}".format(output_prefix)) + print("Output files will start with prefix '{0}'".format(output_prefix)) no_tsv = arguments['--no-tsv'] From 59e1b13780d9d8dee6742921b61241ada352eccf Mon Sep 17 00:00:00 2001 From: Kathy Date: Tue, 4 Oct 2022 15:24:27 -0400 Subject: [PATCH 11/17] add some bash scripts --- 1_sequence_prediction.sh | 49 ++++++++++++++++++++++++++++++++++ 1_variant_effect_prediction.sh | 46 +++++++++++++++++++++++++++++++ 2_raw_sc_score.sh | 23 ++++++++++++++++ 2_varianteffect_sc_score.sh | 33 +++++++++++++++++++++++ 4 files changed, 151 insertions(+) create mode 100755 1_sequence_prediction.sh create mode 100755 1_variant_effect_prediction.sh create mode 100755 2_raw_sc_score.sh create mode 100755 2_varianteffect_sc_score.sh diff --git a/1_sequence_prediction.sh b/1_sequence_prediction.sh new file mode 100755 index 0000000..c44c859 --- /dev/null +++ b/1_sequence_prediction.sh @@ -0,0 +1,49 @@ +#!/usr/bin/env bash + +##################################################################### +# Example script for running Sei deep learning model sequence +# prediction with Selene + +# Usage: +# sh 1_sequence_prediction.sh --cuda + +# Please only specify hg38 or hg19 as input for if is +# a BED file. If you are using a FASTA file of sequences, you can specify +# whatever genome version you wish for your own reference (will be printed +# as part of output for this script) but do not leave it empty. + +# --cuda is optional, use if you are running on a CUDA-enabled +# GPU machine (see 1_example_seqpred.slurm_gpu.sh) +##################################################################### + +set -o errexit +set -o pipefail +set -o nounset + +input_filepath="${1:-}" +hg_version="${2:-}" +out_dir="${3:-}" +cuda="${4:-}" + +mkdir -p $out_dir + +input_basename=$(basename $input_filepath) +cp $input_filepath $out_dir/ + +echo "Input argments: $input_filepath $out_dir $hg_version $cuda" + +if [ "$cuda" = "--cuda" ] +then + echo "use_cuda: True" + python -u 1_sequence_prediction.py \ + "$out_dir/$input_basename" \ + $out_dir \ + --genome=${hg_version} \ + --cuda +else + echo "use_cuda: False" + python -u 1_sequence_prediction.py \ + "$out_dir/$input_basename" \ + $out_dir \ + --genome=${hg_version} +fi diff --git a/1_variant_effect_prediction.sh b/1_variant_effect_prediction.sh new file mode 100755 index 0000000..f2e7a08 --- /dev/null +++ b/1_variant_effect_prediction.sh @@ -0,0 +1,46 @@ +#!/usr/bin/env bash + +############################################################### +# Example script for running the variant effect prediction +# pipeline for Sei and sequence classes. + +# Usage: +# sh run_pipeline.sh [--cuda] + +# Please only specify hg38 or hg19 as input for . + +# --cuda is optional, use if you are running on a CUDA-enabled +# GPU machine (see 1_example_vep.slurm_gpu.sh) +############################################################### + +set -o errexit +set -o pipefail +set -o nounset + +vcf_filepath="${1:-}" +hg_version="${2:-}" +outdir="${3:-}" +cuda="${4:-}" + +mkdir -p $outdir + +echo "Input arguments: $vcf_filepath $hg_version $outdir $cuda" + +vcf_basename=$(basename $vcf_filepath) +cp $vcf_filepath $outdir/ + +if [ "$cuda" = "--cuda" ] +then + echo "use_cuda: True" + python -u 1_variant_effect_prediction.py \ + "$outdir/$vcf_basename" \ + $outdir \ + --genome=${hg_version} \ + --cuda +else + echo "use_cuda: False" + python -u 1_variant_effect_prediction.py \ + "$outdir/$vcf_basename" \ + $outdir \ + --genome=${hg_version} +fi diff --git a/2_raw_sc_score.sh b/2_raw_sc_score.sh new file mode 100755 index 0000000..5dd4f15 --- /dev/null +++ b/2_raw_sc_score.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash + +##################################################################### +# Example script for computing the raw sequence class scores +# given Sei sequence predictions (NO variant effect, not adjusted for +# nucleosome occupancy) + +# Usage: +# sh 2_raw_sc_score.sh +##################################################################### + +set -o errexit +set -o pipefail +set -o nounset + +input_filepath="${1:-}" +out_dir="${2:-}" + +mkdir -p $out_dir + +echo "Input argments: $input_filepath $out_dir" + +python -u 2_raw_sc_score.py $input_filepath $out_dir diff --git a/2_varianteffect_sc_score.sh b/2_varianteffect_sc_score.sh new file mode 100755 index 0000000..16f5c6b --- /dev/null +++ b/2_varianteffect_sc_score.sh @@ -0,0 +1,33 @@ +#!/usr/bin/env bash + +##################################################################### +# Example script for computing the variant effect sequence class scores +# given Sei sequence predictions + +# Usage: +# sh 2_varianteffect_sc_score.sh +# [--no-tsv] +##################################################################### + +set -o errexit +set -o pipefail +set -o nounset + +ref_fp="${1:-}" +alt_fp="${2:-}" +out_dir="${3:-}" +no_tsv="${4:-}" + +mkdir -p $out_dir + +echo "Input argments: $ref_fp $alt_fp $out_dir $no_tsv" + +if [ "$no_tsv" = "--no-tsv" ] +then + echo "--no-tsv flag is used" + python -u 2_varianteffect_sc_score.py $ref_fp $alt_fp $out_dir --no-tsv +else + echo "--no-tsv flag is not used" + python -u 2_varianteffect_sc_score.py $ref_fp $alt_fp $out_dir +fi + From a5a005d6b5f6203feb63ea5a026fae664ddc93fe Mon Sep 17 00:00:00 2001 From: Kathy Date: Fri, 7 Oct 2022 14:56:35 -0400 Subject: [PATCH 12/17] update yaml file names --- 1_variant_effect_prediction.py | 2 +- TODOs | 4 -- model/sei_seq_prediction.yml | 23 ++++++++++ ...n.yml => sei_varianteffect_prediction.yml} | 4 +- run_pipeline.gpu_node.sh | 18 -------- run_pipeline.sh | 43 ------------------- 6 files changed, 26 insertions(+), 68 deletions(-) delete mode 100644 TODOs create mode 100755 model/sei_seq_prediction.yml rename model/{sei_prediction.yml => sei_varianteffect_prediction.yml} (88%) delete mode 100755 run_pipeline.gpu_node.sh delete mode 100755 run_pipeline.sh diff --git a/1_variant_effect_prediction.py b/1_variant_effect_prediction.py index 2ce5fea..622038b 100755 --- a/1_variant_effect_prediction.py +++ b/1_variant_effect_prediction.py @@ -69,5 +69,5 @@ def run_config(config_yml, output_dir): sei_out = os.path.join(arguments[""], "chromatin-profiles-hdf5") os.makedirs(sei_out, exist_ok=True) - run_config("./model/sei_prediction.yml", sei_out) + run_config("./model/sei_varianteffect_prediction.yml", sei_out) diff --git a/TODOs b/TODOs deleted file mode 100644 index 0d2861c..0000000 --- a/TODOs +++ /dev/null @@ -1,4 +0,0 @@ -- `sequence_class.py` no longer exists, need to update `run_pipeline.sh` -- need new sequence prediction config -- `sequence_class.from_seq_prediction.py` is not done -- nothing is tested yet diff --git a/model/sei_seq_prediction.yml b/model/sei_seq_prediction.yml new file mode 100755 index 0000000..1a78b7f --- /dev/null +++ b/model/sei_seq_prediction.yml @@ -0,0 +1,23 @@ +--- +ops: [analyze] +model: { + path: /model/sei.py, + class: Sei, + class_args: { + }, + non_strand_specific: mean +} +analyze_sequences: !obj:selene_sdk.predict.AnalyzeSequences { + sequence_length: 4096, + batch_size: 128, + trained_model_path: /model/sei.pth, + features: !obj:selene_sdk.utils.load_features_list { + input_path: /model/target.names + }, + write_mem_limit: 1000 +} +prediction: { + output_format: hdf5 +} +random_seed: 999 +... diff --git a/model/sei_prediction.yml b/model/sei_varianteffect_prediction.yml similarity index 88% rename from model/sei_prediction.yml rename to model/sei_varianteffect_prediction.yml index 0da8e2f..ae0b0bb 100755 --- a/model/sei_prediction.yml +++ b/model/sei_varianteffect_prediction.yml @@ -14,10 +14,10 @@ analyze_sequences: !obj:selene_sdk.predict.AnalyzeSequences { features: !obj:selene_sdk.utils.load_features_list { input_path: /model/target.names }, - write_mem_limit: 100000 + write_mem_limit: 1000 } variant_effect_prediction: { - save_data: [predictions], + save_data: [diffs, predictions], output_format: hdf5 } random_seed: 999 diff --git a/run_pipeline.gpu_node.sh b/run_pipeline.gpu_node.sh deleted file mode 100755 index 31dd7d5..0000000 --- a/run_pipeline.gpu_node.sh +++ /dev/null @@ -1,18 +0,0 @@ -#!/bin/bash -#SBATCH --time=1-00:00:00 -#SBATCH --gres=gpu:1 -#SBATCH --constraint=v100 -#SBATCH --partition=gpu -#SBATCH -n 1 -#SBATCH --mem 50G -#SBATCH --mail-type=FAIL -#SBATCH --mail-user= - -# Example SLURM script for running the Sei framework code - -vcf_filepath="${1:-}" # path to vcf file -hg_version="${2:-}" # hg19 or hg38 -outdir="${3:-}" # path to output dir -cuda="${4:-}" # --cuda flag - -sh run_pipeline.sh $vcf_filepath $hg_version $outdir $cuda diff --git a/run_pipeline.sh b/run_pipeline.sh deleted file mode 100755 index 8e66dfa..0000000 --- a/run_pipeline.sh +++ /dev/null @@ -1,43 +0,0 @@ -#!/usr/bin/env bash - -############################################################### -# Example script for running the variant effect prediction -# pipeline for Sei and sequence classes. - -# sh run_pipeline.sh --cuda - -# Please only specify hg38 or hg19 as input for . - -# --cuda is optional, use if you are running on a CUDA-enabled -# GPU machine -############################################################### - -set -o errexit -set -o pipefail -set -o nounset - -vcf_filepath="${1:-}" -hg_version="${2:-}" -outdir="${3:-}" -cuda="${4:-}" - -mkdir -p $outdir - -vcf_basename=$(basename $vcf_filepath) -cp $vcf_filepath $outdir/ - -if [ "$cuda" = "--cuda" ] -then - echo "use_cuda: True" - python -u vep_cli.py "$outdir/$vcf_basename" \ - $outdir \ - --genome=${hg_version} \ - --cuda -else - echo "use_cuda: False" - python -u vep_cli.py "$outdir/$vcf_basename" \ - $outdir \ - --genome=${hg_version} -fi - -python sequence_class.py $outdir "$vcf_basename" From 5febfc09e6d6c88e7a6282d7db9dc04cad65ad37 Mon Sep 17 00:00:00 2001 From: Kathy Date: Fri, 7 Oct 2022 15:44:43 -0400 Subject: [PATCH 13/17] reviewed code/script documentation --- 1_sequence_prediction.py | 1 + 1_sequence_prediction.sh | 2 +- 1_variant_effect_prediction.py | 5 +++-- 1_variant_effect_prediction.sh | 8 ++++---- 2_raw_sc_score.py | 14 ++++++-------- 2_raw_sc_score.sh | 3 +-- 2_varianteffect_sc_score.py | 9 +++++---- 2_varianteffect_sc_score.sh | 2 +- 8 files changed, 22 insertions(+), 22 deletions(-) diff --git a/1_sequence_prediction.py b/1_sequence_prediction.py index 3bba021..3baf42c 100755 --- a/1_sequence_prediction.py +++ b/1_sequence_prediction.py @@ -60,6 +60,7 @@ def _finditem(obj, val): configs["prediction"].update(input_path=seq_input, output_dir=sei_out) configs["analyze_sequences"].bind(use_cuda=use_cuda) + # Assumes BED file coordinates input if file doesn't end with .fa or .fasta if not seq_input.endswith('.fa') and not seq_input.endswith('.fasta'): hg_version = arguments["--genome"] genome = None diff --git a/1_sequence_prediction.sh b/1_sequence_prediction.sh index c44c859..e1149df 100755 --- a/1_sequence_prediction.sh +++ b/1_sequence_prediction.sh @@ -13,7 +13,7 @@ # as part of output for this script) but do not leave it empty. # --cuda is optional, use if you are running on a CUDA-enabled -# GPU machine (see 1_example_seqpred.slurm_gpu.sh) +# GPU machine (see example_slurm_scripts/1_example_seqpred.slurm_gpu.sh) ##################################################################### set -o errexit diff --git a/1_variant_effect_prediction.py b/1_variant_effect_prediction.py index 622038b..e6e9a9b 100755 --- a/1_variant_effect_prediction.py +++ b/1_variant_effect_prediction.py @@ -1,7 +1,8 @@ """ Description: - CLI for variant effect prediction using the Sei deep learning model. - Outputs both Sei chromatin profile predictions and sequence class scores. + CLI for variant effect prediction using the Sei deep learning model, + given an input VCF file. + Outputs Sei chromatin profile predictions. Usage: 1_variant_effect_prediction.py [--genome=] [--cuda] diff --git a/1_variant_effect_prediction.sh b/1_variant_effect_prediction.sh index f2e7a08..e081124 100755 --- a/1_variant_effect_prediction.sh +++ b/1_variant_effect_prediction.sh @@ -1,16 +1,16 @@ #!/usr/bin/env bash ############################################################### -# Example script for running the variant effect prediction -# pipeline for Sei and sequence classes. +# Example script for running Sei variant effect prediction +# using Selene. # Usage: -# sh run_pipeline.sh [--cuda] +# sh 1_variant_effect_prediction.sh [--cuda] # Please only specify hg38 or hg19 as input for . # --cuda is optional, use if you are running on a CUDA-enabled -# GPU machine (see 1_example_vep.slurm_gpu.sh) +# GPU machine (see example_slurm_scripts/1_example_vep.slurm_gpu.sh) ############################################################### set -o errexit diff --git a/2_raw_sc_score.py b/2_raw_sc_score.py index cc3fb89..e859fc2 100755 --- a/2_raw_sc_score.py +++ b/2_raw_sc_score.py @@ -1,9 +1,8 @@ """ Description: - This script is run after `seq_prediction_cli.py`. It computes the - sequence class-level scores based on Sei chromatin profile predictions, - and by default, sorts the variants based on the maximum absolute scores - across sequence classes and outputs the results as TSV files. + This script is run after `1_sequence_prediction.py`. It computes the + raw sequence class scores based on Sei chromatin profile predictions + of sequences (i.e. no variants). Usage: 2_raw_sc_score.py @@ -14,10 +13,9 @@ Path to Sei sequence predictions to compute raw sequence class projection scores (NO variant effect). The directory to output the sequence class scores - & TSVs (if `--no-tsv` is not used). - --out-name= Specify an output filename prefix that all outputted - files will use. Otherwise, filenames will be based on - . + as an .NPY file. + --out-name= Specify an output filename prefix. Otherwise, output + filenames will be based on . """ import os diff --git a/2_raw_sc_score.sh b/2_raw_sc_score.sh index 5dd4f15..311cec0 100755 --- a/2_raw_sc_score.sh +++ b/2_raw_sc_score.sh @@ -2,8 +2,7 @@ ##################################################################### # Example script for computing the raw sequence class scores -# given Sei sequence predictions (NO variant effect, not adjusted for -# nucleosome occupancy) +# given Sei chromatin profile sequence predictions # Usage: # sh 2_raw_sc_score.sh diff --git a/2_varianteffect_sc_score.py b/2_varianteffect_sc_score.py index 8b88db0..931e8d9 100755 --- a/2_varianteffect_sc_score.py +++ b/2_varianteffect_sc_score.py @@ -1,9 +1,10 @@ """ Description: - This script is run after `vep_cli.py`. It computes the sequence class-level - variant effect scores based on Sei chromatin profile predictions, - and by default, sorts the variants based on the maximum absolute scores - across sequence classes and outputs the results as TSV files. + This script is run after `1_variant_effect_prediction.py`. It computes + the variant effect sequence class scores based on Sei chromatin profile + predictions, and by default, sorts the variants based on the maximum + absolute scores across sequence classes, outputting the results as + TSV files. Usage: 2_varianteffect_sc_score.py diff --git a/2_varianteffect_sc_score.sh b/2_varianteffect_sc_score.sh index 16f5c6b..1f742d7 100755 --- a/2_varianteffect_sc_score.sh +++ b/2_varianteffect_sc_score.sh @@ -2,7 +2,7 @@ ##################################################################### # Example script for computing the variant effect sequence class scores -# given Sei sequence predictions +# given Sei chromatin profile variant effect predictions # Usage: # sh 2_varianteffect_sc_score.sh From 09c05aa2f72e0c89193ebc586bb2bb93766a2caa Mon Sep 17 00:00:00 2001 From: Kathy Date: Fri, 7 Oct 2022 17:34:48 -0400 Subject: [PATCH 14/17] full update of README --- README.md | 103 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 9d7afc5..2caec17 100755 --- a/README.md +++ b/README.md @@ -16,10 +16,6 @@ We also provide information and instructions for [how to train the Sei deep lear Sei requires Python 3.6+ and Python packages PyTorch (>=1.0), Selene (>=0.5.0), and `docopt`. You can follow PyTorch installation steps [here](https://pytorch.org/get-started/locally/) and Selene installation steps [here](https://github.com/FunctionLab/selene). Install `docopt` with pip or conda (e.g. `conda install docopt`) -## Variant effect prediction - -Sei predicts variant effects by comparing predictions from a pair of sequences carrying the reference allele and the alternative allele respectively. Our code provides both sequence-class level (40 classes) and chromatin profile-level (21,907 targets) predictions. - ### Setup Please download and extract the trained Sei model and `resources` (containing hg19 and hg38 FASTA files) `.tar.gz` files before proceeding: @@ -31,38 +27,105 @@ sh ./download_data.sh - [Sei model](https://doi.org/10.5281/zenodo.4906996) - [Sei framework `resources` directory](https://doi.org/10.5281/zenodo.4906961) -### Usage +### Chromatin profile prediction + +1. The following scripts can be used to obtain Sei deep learning predictions for 21,907 chromatin profiles (please run on a GPU node): +(1) `1_sequence_prediction.py` (and corresponding bash script, `1_sequence_prediction.sh`): Accepts either a BED (`.bed`) or FASTA (`.fa`, `.fasta`) file as input and makes sequence predictions. + +Example usage: ``` -sh run_pipeline.sh [--cuda] +sh 1_sequence_prediction.sh --cuda ``` -Example command + +Arguments: +- ``: BED or FASTA input file +- ``: If you use a BED file as input, this must be either `hg19` or `hg38` as these are the FASTA reference genome files we provide by default. If you are using a FASTA file, you can specify whichever genome version you are using for logging purposes. +- ``: Path to output directory (will be created if does not exist) +- `--cuda`: Optional, use this flag if running on a CUDA-enabled GPU. + +You can run `python 1_sequence_prediction.py -h` for the full documentation of inputs. + +2. `1_variant_effect_prediction.py` (and corresponding bash script, `1_variant_effect_prediction.sh`): Accepts a VCF file as input and makes variant effect predictions. + +Example usage: ``` -sh run_pipeline.sh test.vcf hg19 test-output --cuda +sh 1_variant_effect_prediction.sh [--cuda] ``` Arguments: -- ``: Input VCF file -- ``: Reference FASTA. By default the framework accepts `hg19` or `hg38` coordinates. +- ``: VCF file +- ``: Either hg19 or hg38 - ``: Path to output directory (will be created if does not exist) - `--cuda`: Optional, use this flag if running on a CUDA-enabled GPU. -We provide `test.vcf` (hg19 coordinates) so you can try running this command once you have installed all the requirements. Additionally, `run_pipeline.gpu_node.sh` is an example SLURM script with the same expected input arguments if you need to submit your job to a compute cluster. +You can run `python 1_variant_effect_prediction.py -h` for the full documentation of inputs. + +These scripts will output the chromatin profile predictions as HDF5 files to a subdirectory `chromatin-profiles-hdf5` in your specified output directory. + +See `example_slurm_scripts/1_example_seqpred.slurm_gpu.sh` and `example_slurm_scripts/1_example_vep.slurm_gpu.sh` for sample scripts for running chromatin profile prediction on SLURM. + + +### Sequence class prediction + +Sequence class scores can be obtained from Sei chromatin profile predictions. There are 2 types of scores that can be computed: + +- Raw sequence class scores: For sequences only. **Note** our manuscript uses the Louvain community clustering, whole-genome sequence class annotation of the human genome whenever we apply sequence classes to reference genome sequences, and we encourage the use of these annotations over the raw sequence class scores when possible. Sequence class annotations for hg38 and hg19 (lifted over from hg38) are available for download from [this Zenodo record](10.5281/zenodo.7113989). +- Sequence class variant effect score (nucleosome-occupancy-adjusted): For variants only. Computed as alt - ref of the raw sequence class scores **adjusted for nucleosome occupancy, i.e. histone normalized**. To better represent predicted variant effects on histone marks, it is necessary to normalize for nucleosome occupancy (for example, a LoF mutation near the TSS can decrease H3K4me3 modification level while increasing nucleosome occupancy, resulting in an overall increase in observed H3K4me3 quantity). Therefore, for variant effect computation, we used the sum of all histone profile predictions as an approximation to nucleosome occupancy and adjusted all histone mark predictions to remove the impact of nucleosome occupancy change (nonhistone mark predictions are unchanged). See manuscript methods for more detail. + +#### Sequence prediction + +Example usage: +``` +sh 2_raw_sc_score.sh +``` + +Arguments: +- ``: Path to the Sei `_predictions.h5` file. +- ``: Path to output directory (will be created if does not exist) -**Additional note**: we have added the capability of predicting variant effects from a pair of sequences in the `vep_cli_seq.py` script in the [`vep_seq`](https://github.com/FunctionLab/sei-framework/tree/vep_seq) development branch of this repo. +You can run `python 2_raw_sc_score.py -h` for the full documentation of inputs. -### Outputs +#### Variant effect prediction -The following files and directories will be outputted: -- `chromatin-profiles-hdf5`: directory -- `chromatin_profiles_diffs.tsv`: chromatin profile prediction TSV file (**Note:** output file will be compressed if input has >10000 variants) -- `sequence_classes_scores.tsv`: sequence class prediction TSV file +Example usage: +``` +sh 2_varianteffect_sc_score.sh [--no-tsv] +``` + +Arguments: +- ``: Path to the Sei `.ref_predictions.h5` file. +- ``: Path to the Sei `.alt_predictions.h5` file. +- ``: Path to output directory (will be created if does not exist) +- `--no-tsv`: Optional flag if you'd like to suppress the outputted TSV files (see the next section 'Example variant effect prediction run' for more information). + +You can run `python 2_varianteffect_sc_score.py -h` for the full documentation of inputs. + +### Example variant effect prediction run: + +We provide `test.vcf` (hg19 coordinates) so you can try running this command once you have installed all the requirements. Additionally, `example_slurm_scripts` contains example scripts with the same expected input arguments if you need to submit your job to a compute cluster. + +Example command run on GPU: +``` +sh 1_variant_effect_prediction.sh test.vcf hg19 ./test_outputs --cuda +``` + +Example command run on CPU: +``` +sh 2_varianteffect_sc_score.sh ./test_outputs/chromatin-profiles-hdf5/test.ref_predictions.h5 \ + ./test_outputs/chromatin-profiles-hdf5/test.alt_predictions.h5 \ + ./test_outputs +``` +You can add `--no-tsv` to this command to suppress the TSV file outputs if you are comfortable working with HDF5 and NPY files. Note you will need to match the rows to the `test_row_labels.txt` file in `./test_outputs/chromatin-profiles.hdf5` and the columns to `./model/target.names` (chromatin profile HDF5 files) and `./model/seqclass.names` (sequence class NPY file). -The two `*.tsv` files are the final formatted outputs, while the `chromatin-profiles-hdf5` directory contains the intermediate HDF5 and row label files outputted from Selene from running the Sei deep learning model. -You can use the HDF5 files directly if desired, but please keep in mind that the variants will not be ordered in the same way as the TSV files. (Please see the corresponding `*_row_labels.txt` files in `chromatin-profiles-hdf5`, for the variant labels.) +Expected outputs: +- `chromatin-profiles-hdf5`: directory containing HDF5 Sei predictions files and the corresponding `test_row_labels.txt` file. +- `sorted.test.chromatin_profile_diffs.tsv`: chromatin profile prediction TSV file (**Note:** output file will be compressed if input has >10000 variants), sorted by max absolute sequence class score. +- `sorted.test.sequence_class_scores.tsv`: sequence class prediction TSV file, sorted by max absolute sequence class scores. +- `test.sequence_class_scores.npy`: sequence class scores NPY file, note this is NOT sorted and will be ordered in the same way as `chromatin-profiles-hdf5/test_row_labels.txt` file. -### Sequence classes +## Sequence classes Sequence classes are defined based on 30 million sequences tiling the genome and thus cover a wide range of sequence activities. To help interpretation, we grouped sequence classes into groups including P (Promoter), E (Enhancer), CTCF (CTCF-cohesin binding), TF (TF binding), PC (Polycomb-repressed), HET (Heterochromatin), TN (Transcription), and L (Low Signal) sequence classes. Please refer to our manuscript for a more detailed description of the sequence classes. From e194a8397e721764cb2854fba6a5f6aef43277bc Mon Sep 17 00:00:00 2001 From: Kathy Date: Mon, 10 Oct 2022 15:09:14 -0400 Subject: [PATCH 15/17] forgot to add the example slurm scripts directory --- .../1_example_seqpred.slurm_gpu.sh | 18 ++++++++++++++++++ .../1_example_vep.slurm_gpu.sh | 18 ++++++++++++++++++ .../2_example_raw_sc.slurm_cpu.sh | 11 +++++++++++ .../2_example_vep_sc.slurm_cpu.sh | 13 +++++++++++++ 4 files changed, 60 insertions(+) create mode 100755 example_slurm_scripts/1_example_seqpred.slurm_gpu.sh create mode 100755 example_slurm_scripts/1_example_vep.slurm_gpu.sh create mode 100755 example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh create mode 100755 example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh diff --git a/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh b/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh new file mode 100755 index 0000000..3bd2763 --- /dev/null +++ b/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh @@ -0,0 +1,18 @@ +#!/bin/bash +#SBATCH --time=1-00:00:00 +#SBATCH --gres=gpu:1 +#SBATCH --constraint=v100 +#SBATCH --partition=gpu +#SBATCH -n 1 +#SBATCH --mem 100G +#SBATCH --mail-type=FAIL +#SBATCH --mail-user= + +# Example SLURM script for running the Sei framework code +# on input sequences from a BED or FASTA file + +input_filepath="${1:-}" # path to input bed/fasta file +genome_version="${2:-}" # genome version +outdir="${3:-}" # path to output dir + +sh ../1_sequence_prediction.sh $input_filepath $genome_version $outdir --cuda diff --git a/example_slurm_scripts/1_example_vep.slurm_gpu.sh b/example_slurm_scripts/1_example_vep.slurm_gpu.sh new file mode 100755 index 0000000..5652871 --- /dev/null +++ b/example_slurm_scripts/1_example_vep.slurm_gpu.sh @@ -0,0 +1,18 @@ +#!/bin/bash +#SBATCH --time=1-00:00:00 +#SBATCH --gres=gpu:1 +#SBATCH --constraint=v100 +#SBATCH --partition=gpu +#SBATCH -n 1 +#SBATCH --mem 100G +#SBATCH --mail-type=FAIL +#SBATCH --mail-user= + +# Example SLURM script for running the Sei framework code +# on variants in the input VCF file + +vcf_filepath="${1:-}" # path to vcf file +hg_version="${2:-}" # hg19 or hg38 +outdir="${3:-}" # path to output dir + +sh ../1_variant_effect_prediction.sh $vcf_filepath $hg_version $outdir --cuda diff --git a/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh b/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh new file mode 100755 index 0000000..0cf322c --- /dev/null +++ b/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh @@ -0,0 +1,11 @@ +#!/bin/bash +#SBATCH --time=1-00:00:00 +#SBATCH --partition=ccb + +# Example SLURM script for getting the raw sequence class scores +# for input sequence predictions (i.e. no variants) + +input_preds="${1:-}" # input preds path +outdir="${2:-}" # output dir path + +sh ../2_raw_sc_score.sh $input_preds $outdir diff --git a/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh b/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh new file mode 100755 index 0000000..2c4c264 --- /dev/null +++ b/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh @@ -0,0 +1,13 @@ +#!/bin/bash +#SBATCH --time=1-00:00:00 +#SBATCH --partition=ccb + + +# Example SLURM script for getting the variant effect sequence class scores + +ref_preds="${1:-}" # ref preds path +alt_preds="${2:-}" # alt preds path +outdir="${3:-}" # output dir path +no_tsv="${4:-}" # --no-tsv flag + +sh ../2_varianteffect_sc_score.sh $ref_preds $alt_preds $outdir $no_tsv From a974289fbaa1e5f3ec6868f48570e771412f205e Mon Sep 17 00:00:00 2001 From: Kathy Date: Mon, 10 Oct 2022 15:10:13 -0400 Subject: [PATCH 16/17] adjust README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2caec17..f77665c 100755 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ See `example_slurm_scripts/1_example_seqpred.slurm_gpu.sh` and `example_slurm_sc Sequence class scores can be obtained from Sei chromatin profile predictions. There are 2 types of scores that can be computed: -- Raw sequence class scores: For sequences only. **Note** our manuscript uses the Louvain community clustering, whole-genome sequence class annotation of the human genome whenever we apply sequence classes to reference genome sequences, and we encourage the use of these annotations over the raw sequence class scores when possible. Sequence class annotations for hg38 and hg19 (lifted over from hg38) are available for download from [this Zenodo record](10.5281/zenodo.7113989). +- Raw sequence class scores: For sequences only. Raw sequence class scores are projection scores of chromatin profile predictions projected on the unit-length vectors representing each sequence class. This is an intermediate score originally developed for variant score prediction and is made available for use for developing downstream analyses or applications, such as using them as a sequence representation. **Note** our manuscript uses the Louvain community clustering, whole-genome sequence class annotation of the human genome whenever we apply sequence classes to reference genome sequences, and we encourage the use of these annotations over the raw sequence class scores when possible. Sequence class annotations for hg38 and hg19 (lifted over from hg38) are available for download from [this Zenodo record](10.5281/zenodo.7113989). - Sequence class variant effect score (nucleosome-occupancy-adjusted): For variants only. Computed as alt - ref of the raw sequence class scores **adjusted for nucleosome occupancy, i.e. histone normalized**. To better represent predicted variant effects on histone marks, it is necessary to normalize for nucleosome occupancy (for example, a LoF mutation near the TSS can decrease H3K4me3 modification level while increasing nucleosome occupancy, resulting in an overall increase in observed H3K4me3 quantity). Therefore, for variant effect computation, we used the sum of all histone profile predictions as an approximation to nucleosome occupancy and adjusted all histone mark predictions to remove the impact of nucleosome occupancy change (nonhistone mark predictions are unchanged). See manuscript methods for more detail. #### Sequence prediction From f87de4c2429ee3cf8d49667a342e7aac7edf8b5d Mon Sep 17 00:00:00 2001 From: Kathy Date: Mon, 10 Oct 2022 15:57:07 -0400 Subject: [PATCH 17/17] allow example slurm scripts to be run from main repo dir --- example_slurm_scripts/1_example_seqpred.slurm_gpu.sh | 2 +- example_slurm_scripts/1_example_vep.slurm_gpu.sh | 2 +- example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh | 2 +- example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh b/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh index 3bd2763..16d4f1a 100755 --- a/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh +++ b/example_slurm_scripts/1_example_seqpred.slurm_gpu.sh @@ -15,4 +15,4 @@ input_filepath="${1:-}" # path to input bed/fasta file genome_version="${2:-}" # genome version outdir="${3:-}" # path to output dir -sh ../1_sequence_prediction.sh $input_filepath $genome_version $outdir --cuda +sh ./1_sequence_prediction.sh $input_filepath $genome_version $outdir --cuda diff --git a/example_slurm_scripts/1_example_vep.slurm_gpu.sh b/example_slurm_scripts/1_example_vep.slurm_gpu.sh index 5652871..3820efb 100755 --- a/example_slurm_scripts/1_example_vep.slurm_gpu.sh +++ b/example_slurm_scripts/1_example_vep.slurm_gpu.sh @@ -15,4 +15,4 @@ vcf_filepath="${1:-}" # path to vcf file hg_version="${2:-}" # hg19 or hg38 outdir="${3:-}" # path to output dir -sh ../1_variant_effect_prediction.sh $vcf_filepath $hg_version $outdir --cuda +sh ./1_variant_effect_prediction.sh $vcf_filepath $hg_version $outdir --cuda diff --git a/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh b/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh index 0cf322c..a88cb42 100755 --- a/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh +++ b/example_slurm_scripts/2_example_raw_sc.slurm_cpu.sh @@ -8,4 +8,4 @@ input_preds="${1:-}" # input preds path outdir="${2:-}" # output dir path -sh ../2_raw_sc_score.sh $input_preds $outdir +sh ./2_raw_sc_score.sh $input_preds $outdir diff --git a/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh b/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh index 2c4c264..35e90a8 100755 --- a/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh +++ b/example_slurm_scripts/2_example_vep_sc.slurm_cpu.sh @@ -10,4 +10,4 @@ alt_preds="${2:-}" # alt preds path outdir="${3:-}" # output dir path no_tsv="${4:-}" # --no-tsv flag -sh ../2_varianteffect_sc_score.sh $ref_preds $alt_preds $outdir $no_tsv +sh ./2_varianteffect_sc_score.sh $ref_preds $alt_preds $outdir $no_tsv