Transform ekore in header-only C++ library#246
Conversation
|
@felixhekhorn thanks for this, finally a sensible choice, before moving to messy languages and compilation tools 😆. |
When I double checked the last statement, I realized there is some caching going on ... so I removed all caching (I hope) and now we have a small gain here: t(c) ~ 2s (of course as before) vs. t(numba) ~ 2.6 s. Also, of course just calling CFFI remains at 0.5s |
|
so really the non-cacheability is killing this PR |
|
Is CFFI recompiling at each call? It seems silly and unbelievable... |
the |
Okay, you're right, I forgot you started from CERNlib (I was actually considering it as completely external to EKO). Is then CFFI usage killing Numba cache? |
yes! see the warning in the description:
|
|
Okay, I see, maybe I got their point: a Numba cached function should be more or less a There are a lot of maybes in this explanation, I have no idea of Numba internals (other than they are using a stripped down version of LLVM). But yes, if they have no mechanism to link to other What about replacing |
oookay, playing a bit more - this is possible: I attach the relevant files here below. However, it is not clear to me how this would help us ... fine, we would not need to port everything at once, but we can not tackle the main obstacle: the conjecture is cfunc.c// gcc -shared -o cfunc.so -fPIC cfunc.c
#include<stdio.h>
typedef double (*fnc)(double, double);
double do_add(fnc f, double a, double b) {
return f(a,b);
}
double my_add(double a, double b) { return a + b; }cfunc.pyfrom cffi import FFI
import numba as nb
import numpy as np
import ctypes
from cernlibpy import cern_polygamma
# step 1: we create a numba c-function
# ------------------------------------
@nb.cfunc("float64(float64, float64)")
def add(x, y):
return x + y + np.real(cern_polygamma(4.,0))
print("calling add.ctypes",add.ctypes(4.0, 5.0)) # prints "9.0"
# print(add.address)
# step 2a: we call add in c via cffi
# ----------------------------------------
ffi = FFI()
cernlibc = ffi.dlopen('./cfunc.so')
ffi.cdef("""
double do_add(void* f, double a, double b);
""")
# we need to "activate" the actual function first
c_do_add = cernlibc.do_add
@nb.njit() # we need this here, else FFI complains
def do_add(addr,a,b):
res = c_do_add(addr, a, b)
return res
print("calling do_add via numba",do_add(add.address, 4.,6.))
# step 2b: or we call add in c via ctypes
# ---------------------------------------
libc = ctypes.CDLL("cfunc.so")
my_add_py = libc.my_add
my_add_py.argtypes = [ctypes.c_double, ctypes.c_double]
my_add_py.restype = ctypes.c_double
print("calling my_add via ctypes ",my_add_py(4.,7.))
do_add_py = libc.do_add
CMPFUNC_t = ctypes.CFUNCTYPE(ctypes.c_float, ctypes.c_float, ctypes.c_float)
do_add_py.argtypes = [CMPFUNC_t, ctypes.c_double, ctypes.c_double]
do_add_py.restype = ctypes.c_double
print("calling do_add via ctypes", do_add_py(ctypes.cast(add.address,CMPFUNC_t),4.,8.))cernlibpy.pyimport numpy as np
import numba as nb
@nb.njit()
def cern_polygamma(Z, K):
# fmt: off
DELTA = 5e-13
R1 = 1
HF = R1/2
C1 = np.pi**2
C2 = 2*np.pi**3
C3 = 2*np.pi**4
C4 = 8*np.pi**5
# SGN is originally indexed 0:4 -> no shift
SGN = [-1,+1,-1,+1,-1]
# FCT is originally indexed -1:4 -> shift +1
FCT = [0,1,1,2,6,24]
# C is originally indexed 1:6 x 0:4 -> swap indices and shift new last -1
C = nb.typed.List()
C.append([
8.33333333333333333e-2,
-8.33333333333333333e-3,
3.96825396825396825e-3,
-4.16666666666666667e-3,
7.57575757575757576e-3,
-2.10927960927960928e-2])
C.append([
1.66666666666666667e-1,
-3.33333333333333333e-2,
2.38095238095238095e-2,
-3.33333333333333333e-2,
7.57575757575757576e-2,
-2.53113553113553114e-1])
C.append([
5.00000000000000000e-1,
-1.66666666666666667e-1,
1.66666666666666667e-1,
-3.00000000000000000e-1,
8.33333333333333333e-1,
-3.29047619047619048e+0])
C.append([
2.00000000000000000e+0,
-1.00000000000000000e+0,
1.33333333333333333e+0,
-3.00000000000000000e+0,
1.00000000000000000e+1,
-4.60666666666666667e+1])
C.append([10., -7., 12., -33., 130., -691.])
U=Z
X=np.real(U)
A=np.abs(X)
if K < 0 or K > 4:
raise NotImplementedError("Order K has to be in [0:4]")
if np.abs(np.imag(U)) < DELTA and np.abs(X+int(A)) < DELTA:
raise ValueError("Argument Z equals non-positive integer")
K1=K+1
if X < 0:
U=-U
V=U
H=0
if A < 15:
H=1/V**K1
for I in range(1,14-int(A)+1):
V=V+1
H=H+1/V**K1
V=V+1
R=1/V**2
P=R*C[K][6-1]
for I in range(5,1-1,-1):
P=R*(C[K][I-1]+P)
H=SGN[K]*(FCT[K+1]*H+(V*(FCT[K-1+1]+P)+HF*FCT[K+1])/V**K1)
if K == 0:
H=H+np.log(V)
if X < 0:
V=np.pi*U
X=np.real(V)
Y=np.imag(V)
A=np.sin(X)
B=np.cos(X)
T=np.tanh(Y)
P=complex(B,-A*T)/complex(A,B*T)
if K == 0:
H=H+1/U+np.pi*P
elif K == 1:
H=-H+1/U**2+C1*(P**2+1)
elif K == 2:
H=H+2/U**3+C2*P*(P**2+1)
elif K == 3:
R=P**2
H=-H+6/U**4+C3*((3*R+4)*R+1)
elif K == 4:
R=P**2
H=H+24/U**5+C4*P*((3*R+5)*R+2)
return H
# fmt: on |
|
Okay, this is now in a presentable state, so I'm asking everybody for some feedback on how to proceed @alecandido @giacomomagni @niclaurenti @t7phy (cc @scarrazza ) ... let me summarize some points: The goals I consider are:
I'm only considering the most simple setup here: FFNS, LO QCD, LHA I believe to have addressed some of the goals: #include "ekorepp/types.hpp"
#include "ekorepp/harmonics/cache.hpp"
#include "ekorepp/anomalous_dimensions/unpolarized/space_like.hpp"
int main() {
// ...
ekorepp::harmonics::Cache c(n);
const ekorepp::tnsr res = ekorepp::anomalous_dimensions::unpolarized::spacelike::gamma_singlet(order_qcd, c, nf);
// ...
}this should be sufficient for most languages to interface. I opted for C++ because 1) I don't speak C and 2) I don't want to deal with pointers if I can avoid and we need some lists/tensors. eko/src/eko/evolution_operator/c_quad_ker.cpp Line 137 in 765582b c) as said only ekore is ported, but nothing more. This means that we need in addition the actual interface which is here under eko/ (and not under ekorepp/). I.e. to run this branch you need to execute this line
to compile the interface. In practice one needs to run this command every time something inside ekore changes.d) the issue with caching discussed earlier in this PR is completely gone - we're caching everything. We lost JIT compilation, because I need a function pointer address to be passed around. This function we would still need to compile on client side, but (I hope) that should be quick, because there are no 1000s of lines, but just a few. The major compilation (in C++) instead should now be done on wheel creation time. e) we get a small speed up: running $ EKO_LOG_STDOUT=1 poe lha -m "ffns and lo and not pol" in master: ~50s here: ~12s (without C++ optimation flags) ~9s (with -O2 - no improvement for -O3). See also PS below.f) despite the branch name there is no trace of CFFI left, but we only use ctypes (which is standard library)
What do you think? do we want to go down this road? There are of course still some problems left:
(any of the above points of course applies also to any other strategy) PS: I believe in master we're not running full numba under quad - because there is this eko/src/eko/evolution_operator/__init__.py Line 801 in 4dc65b4 LowLevelCallable as is done here eko/src/eko/evolution_operator/__init__.py Line 389 in 765582b |
|
LGTM, if we improve by a factor of 5 it's quite good!
what about this https://breathe.readthedocs.io/en/latest/ ?
we can write something to automatise the process or import everything again from Mathematica depending on the case. |
This is just (not too complicated) string manipulation, we can do it :) |
|
|
|
closed in favor of #189 |
Alternative to #189 by @alecandido
for recent status see this post
cern_polygammafunction).cfile with the given command (of course due to this tests are failing)masterbecause we lost caching:which results in
cern_polygammathe non-cacheability bubbles all the way up everywhere (asekore) ...void*, but of course the actual functions usedouble*- I believe this is needed to convince numba to resolve the actual address for the callcc @scarrazza as we were discussing this a bit when you've been around 🙃