just for info, as it has little related with Racket, but the slowless was due to 2 things:
- unecessary deep copy of object in Python code (i did not wrote this code ,so i had to find all the caveats inside), but this could happen in many language, C++,etc ,perheaps less in functional language , the Python code was:
rv=np.array([vect_x, vect_y, vect_z])
is taking 88 sec , replacing with:
rv=[vect_x, vect_y, vect_z]
or:
return vect_x, vect_y, vect_z
are immediate, no latency, and the list or tuple can still be indexed as arrays.So no further code to modify.
creating rv
vect_x.size= 566231040
returning vect np.array list
DEBUG: appel de list 0.009212017059326172
DEBUG: liste créé 3.0994415283203125e-06
returning vect list
- slow VTK reader from the official package , a bit incredible but rewriting my own reader ,reading still all the cube of data as the code below, accelerate the reading by a factor of 30:
if nibbles_away and math == 'VECTORS' :
data_file_bin = open(os.path.join(self.address,tar_file+'.vtk'),'rb') # binary mode
print("DEBUG: mod_from : data_file_bin =",data_file_bin);
# skip the text lines
data_file_bin.readline() # comments vtk Datafile version ....
data_file_bin.readline() # textual description
data_format_bin = data_file_bin.readline() # format
print('DEBUG: mod_from : data_format_bin=',data_format_bin)
data_structure_bin = data_file_bin.readline() # structure
print('DEBUG: mod_from : data_structure_bin=',data_structure_bin)
data_dimension_bin = data_file_bin.readline() # lit les dimensions
print('DEBUG: mod_from : data_dimension_bin=',data_dimension_bin)
data_origin_bin=data_file_bin.readline() # ici ORIGIN
print('DEBUG: mod_from : data_origin_bin=',data_origin_bin)
data_spacing_bin=data_file_bin.readline() # SPACING ...
print('DEBUG: mod_from : data_spacing_bin=',data_spacing_bin)
data_product_bin=data_file_bin.readline() #NB here you have the nx*ny*nz preduct
math_physic_comp_line_bin=data_file_bin.readline() # VECTORS ... float
print('DEBUG: mod_from : math_physic_comp_line_bin=',math_physic_comp_line_bin)
# reading the text data using numpy.fromfile()
dt = np.dtype('>f4') # force big-endian on 64bits floats
tA = time.time()
vect = np.fromfile(data_file_bin, dtype=dt) # lecture données binaires
tB = time.time()
print('DEBUG: Time nibbles away mode:',tB-tA)
print("Binary Data size:", vect.size)
print("Binary Data shape:", vect.shape)
print("Binary Data:", vect)
vect=vect.reshape(size_cube,size_element)
print("Binary Data reshape size:", vect.size) # taille d'un cube
print("Binary Data reshape shape:", vect.shape) # 3 cubes pour par exemple Bx,By,Bz, les cubes contiennent la valeur Bx (ou By ou Bz) à chaque Point3D du cube.
print("Binary Data reshape:", vect)
data_file_bin.close()
DEBUG: Time nibbles away mode: 2.7550032138824463
simply reading the VTK file in binary mode with numpy and fromfile was done in less than 3 seconds instead of 98 secondes for the official library.... but had to fix endianess to make it works and a few other tricks...
but there is still a possible solution of partial reading data in Scheme or Python to save the last 2 or 3 seconds but this is another story.....
(sorry for all this Python code above)
Damien