Is defining a function inside a loop without time penalty?

damien_mattei · December 4, 2024, 10:41am

hello,

i have a for/list that contains some computation, a simple mathematic expression, that should only be used in the for/lisp loop , so i can define it in the for/lisp body but i'm asking if the defined function will be re-interpreted/compiled each time it loops ? i would answer no, i suppose scheme/racket is not only interpreted ,not being like C,but definitions are sort of compiled only one time? am i wrong?

regards,
Damien

soegaard · December 4, 2024, 10:56am

Give a complete example.

but i'm asking if the defined function will be re-interpreted/compiled each time it loops ?
No. Racket compiles the files ahead of time.

For the usual Scheme implementations (Gambit, Chicken Scheme, etc.) the answer is no as well.

i suppose scheme/racket is not only interpreted ,
Can you rephrase?

damien_mattei · December 4, 2024, 11:51am

thank you @soegaard , i had a little doubt, now it is clear:

i wanted to say interpreted , like the BASIC on an old computer (Apple 2 for example) ,interpreted for me is really at runtime only.

it was the cal procedure from this generated code:

($nfx$
   data-xml
   <-
   (for/list
    ((one-data-row data-interpol) (one-trajectory-row data-trajectory))
    ($nfx$
     (index X_MSO Y_MSO Z_MSO VALx VALy VALz)
     <-
     (apply values (string-split one-data-row)))
    ($nfx$
     (time-stamp x_traj_mso y_traj_mso z_traj_mso)
     <-
     (apply values (string-split one-trajectory-row)))
    (define (cal val) ($nfx$ val * Bsw / B0))
    `(TR
      (TD ,time-stamp)
      (TD ,X_MSO)
      (TD ,Y_MSO)
      (TD ,Z_MSO)
      (TD ,(cal (string->number VALx)))
      (TD ,(cal (string->number VALy)))
      (TD ,(cal (string->number VALz))))))

generated from this scheme+ source code:

{data-xml <- (for/list ([one-data-row data-interpol]
				[one-trajectory-row data-trajectory])

			       ;; take in SimulationData.txt Bo or Vo x,y,z, compute norm
			       ;; compute B or V in output file like this:
			       ;; B_x|y|z_XML=Bintermediate_unit_code_x|y|z * Bsw   /  |Bo input|                             with Bsw=10nT
			       
			       {(index X_MSO Y_MSO Z_MSO VALx VALy VALz) <- (apply values
										   (string-split one-data-row))}
			       {(time-stamp x_traj_mso y_traj_mso z_traj_mso) <- (apply values
											(string-split one-trajectory-row))}

			       (define (cal val) {val * Bsw / B0})
			    
			       `(TR (TD ,time-stamp)
				    (TD ,X_MSO)
				    (TD ,Y_MSO)
				    (TD ,Z_MSO)
				    (TD ,(cal (string->number VALx)))
				    (TD ,(cal (string->number VALy)))
				    (TD ,(cal (string->number VALz)))))}

just for info it is related from the other post, i'm debugging:

https://racket.discourse.group/t/valid-numeric-entity-refused-by-xexpr-xml/3393

soegaard · December 4, 2024, 12:25pm

A simpler example:

(for ([i (in-range 5)])
      (define (foo) 42)
      (foo))

Here the local definition of foo is equivalent to:

   '(for ([i (in-range 5)])
      (letrec ([foo (lambda () 42)])
        (foo))))))

In this case a clever compiler could simplify the letrec-expression to 42.
But if look at the general case, the lambda-expression will result in the allocation of a closure.
[At least if the there are free variables in the body.]

Allocation always take time, so moving the definition of foo outside the loop would be a good idea.

However, there is no time penalty due to "recompilation".

You can compare the expansions like this:

#lang racket
(require racket/pretty)

(pretty-print
 (syntax->datum
  (expand
   '(for ([i (in-range 5)])
      (define (foo) 42)
      (foo)))))

(pretty-print
 (syntax->datum
  (expand
   '(for ([i (in-range 5)])
      (letrec ([foo (lambda () 42)])
        (foo))))))

damien_mattei · December 4, 2024, 2:12pm

thank you, i tested the expand , and i now know why scheme is slow (compared to C)

i even try to expand my part of code above but it is to big expanded to fit the 32000 char of recket discourse , here it is:

gist.github.com

https://gist.github.com/damien-mattei/2b14369a997a3ae1c0c6e0b0e6eea9c0

gistfile1.txt

`
 (pretty-print (syntax->datum (expand '($nfx$
   data-xml
   <-
   (for/list
    ((one-data-row data-interpol) (one-trajectory-row data-trajectory))
    ($nfx$
     (index X_MSO Y_MSO Z_MSO VALx VALy VALz)
     <-
     (apply values (string-split one-data-row)))

This file has been truncated. show original

gus-massa · December 5, 2024, 1:44am

Do you know what is in data-interpol and data-trajectory? If you know that they are lists you can use

(for/list ([one-data-row (in-list data-interpol))
           [one-trajectory-row (in-list data-trajectory)])
...)

that generates faster code. Now Racket generates code for a generic sequence, that s much slower than code for a specific one like list or vector or ...

gus-massa · December 5, 2024, 1:49am

I agree, but I want to add that the compiler may inline or lift the functions (and use other tricks). In this case the code will be equivalent to

(for ([i (in-range 5)])
      42)

and in a more complicated case it would be like

(define (foo) 42)
(for ([i (in-range 5)])
      (foo))

but sometimes the compiler can't apply any of this tricks and allocates a new function each time.

EDIT: can -> can't

damien_mattei · December 5, 2024, 9:46am

interesting. In my case i know it is a local function but which could be 'lifted' before and outside the loop, but for a compiler sort of things could be very hard work to figure it.(for example if there is bad code using global variable,call to other procedures,etc)

gus-massa:

Do you know what is in data-interpol and data-trajectory? If you know that they are lists you can use
(for/list ([one-data-row (in-list data-interpol))
           [one-trajectory-row (in-list data-trajectory)])
...)
that generates faster code. Now Racket generates code for a generic sequence, that s much slower than code for a specific one like list or vector or ...

in my case there is no speed problem, i'm parsing data of 1442 lines (60min*24hours=1440) of a one day spacecraft trajectory at a sampling of 60s, i did not really chronometer (it should take a few seconds, i will try to put a chonometer )but the main computation code is the Python interpolation that took 4 minutes (because Python read a 3D data cube simulated that is approx half-dozen of gigabytes (from a magneto-hydrodynamic code in C++ that runned on a cluster at other moment,sort of // code could take many hours even with MPI) ), i just added chonometer now and the http log of scheme+python give this:

  [15] => interpole_fields : start of Interpolation code in Python: #(struct:date* 39 15 10 5 12 2024 4 339 #f 3600 908612012 CET)
    [16] => number of lines in stderr=0
    [17] => interpole_fields : end of Interpolation code in Python: #(struct:date* 40 19 10 5 12 2024 4 339 #f 3600 839184999 CET)
    [18] => interpole_fields : output-file: trajectory-near_Mio_B_6000.txt
    [19] => interpole_fields : output-file-with-path: /private/var/tmp/VdROykA_directory/trajectory-near_Mio_B_6000.txt
    [20] => interpole_fields : B0= 0.00562
    [21] => interpole_fields : splitted-lines-constant= ((# constant data of physic) () (# Using the text editor vim, one can produce subscripted and superscripted numbers by using the digraphs control-k-ns for subscription and control-k-nS for superscription, where n is an Arabic numeral.) () (Vsw = 400) (Vsw_unit = km.s⁻¹) () (Bsw = 10) (Bsw_unit = nT) () (ρsw = 10) (ρ_unit = cm⁻³) ())
    [22] => interpole_fields : key-filter= '(("Bsw" "=" "10"))
    [23] => interpole_fields : Bsw= 10
    [24] => interpole_fields : end of post-processing XML file: #(struct:date* 41 19 10 5 12 2024 4 339 #f 3600 405458927 CET)

the parsing/generating text and XML file take maximum 1 seconde in Racket as show the struct:date* displayed with (current-time)

to answer the question, data-interpol and data-trajectory are vectors but the original data where list converted from list->vector.

the original code was Scheme+ :

;; reading output file of interpolated data
	;; read all lines
	{output-file-lines <- (read-lines output-file-with-path)}
	{vl <- (list->vector output-file-lines)}

.....

;; read all lines of trajectory file text to get the timestamps (because we have only index in Python output interpolated file)
	{input-lines <- (read-lines trajectory-file)}
	{vl-trajectory <- (list->vector input-lines)}
	{data-trajectory <- vl-trajectory[5 :]} ; skip the header to go to the data lines of the trajectory input file, the trajectory file contains timestamp usefull for generating the XML output file

	{data-interpol <- vl[1 :]} ; the numerical data lines without the header commented line of the output file of simulation that contains data physical values for the output XML file

i prefer using vector as my Scheme+ library can use the same slicing notation than Python ,i.e: vl-trajectory[5 :] and vl[1 :]

i'm happy it takes less than one second in Scheme+/Racket , it proove that the macros and procedures i have created with syntax transformer , one time they have been expanded are fast (and the expansion time is very fast too when i do not display debug info, it is almost immediate)

gus-massa · December 5, 2024, 2:35pm

You can use in-vector instead.

Anyway, if the time is a few hours (days?) in C++, 4 minutes in Python and 1 second in Racket, I'd not overoptimize the Racket part.

How hard is to translate the Python code to Racket? Are you using many libraries?

In my experiece, numeric code is faster in Racket, but strings are quite similar or even faster in Python. (One of my coworkers can get fast numeric code in Phyton using @jit but I never learned the details.)

damien_mattei · December 5, 2024, 3:38pm

oh. not exactly that.There is no port from python to racket.

There is many parts:

A trajectory of the Mio orbiter around the Mercury planet :

trajectories could be get from here (by human or with webservices):

https://amda.irap.omp.eu/

A simulation code:

which generate a 3D cube of physical data

The goal is to compare the result of simulation with the real data collected by the instruments of Mio. This require to put the trajectory in the cube and interpolate the data of simu at exact place of the points of the trajectory.

this is made by a python program cut_1D (not online) that use python libraries (numpy,scipy...) and fibo:

and we run only the interpolation (problem it take 4 minutes or more and needs special timeout in php, http, firefox that limit to 300s) in python online and Racket is between PHP and Python,generate XML,used for web services.
A problem is that just reading the data cube takes minutes as it is Gigabytes, the interpolation even in Python (not Fortran) is very fast indeed.

gus-massa · December 6, 2024, 12:09pm

That's very interesing. Some questions (perhaps too many ) :

Is the cube the same for all the points in the trajectory? (I guess no.)

Are you using all the data in the cube for the interpolation? (I guess it only uses a few voxels arround the point, let's say 10x10x10.) Can you load only a neighborhood of the satelite instead of the whole GB cube?

Is the data in the file of the cube saved in binary or text format. I think we had legacy code in Fortran that used Z16 to save the floating point numbers exactly, but we changed to normal text (I don't remember why. Perhaps because it's easier to find obvious stupid errors and also to cut and paste the intermediate data in other tools.)

Perhaps you can modify the program that create the cube to padd the numbers with 0 or spaces so all the floating point numbers use the same number of characters, and if you want to load x[207,182,99] you can read the file in binary mode and read the positions after (207*1000*1000+182*1000+99)*17+777 where 777 is the size of an imaginary header of the file.

damien_mattei · December 6, 2024, 5:58pm

the cube is the simulated space (where the satellite move in his orbit around Mercury) there will be one cube for each physical data : magnetic field, speed and rho (good question , i forget a bit what researche explain me, i just the guy behind the application web,not researcher, even if i know electro magnetism and learn physic at university and already work on Fortran simulation of stellar physic, rho should be electric density ,speed? hum... some of solar wind but i could be wrong...) there is no big secret just compare real data with simulation, the scientific approach since Greek antiquity.

It is a partially a port of web app of another laboratory, they used Fortran , not Python.

unfortunately selecting a few points force also to read the full cube (which takes 5 minutes) to make the selection

in text it was worse, they give me text file , many times bigger than the binary ones,i had to convert it in binary ,see :

but your idea is interesting, perheaps we can compress the data, i perheaps not try that. (can not remember if the gziped files are shorter) Then modify the python interpolation to deal with compressed data. I can try that but i set up the web server on my Mac laptop and i have not it now, also i noticed on a desktop PC the read of file is faster ,1 minute or 2, if installed on a server this could be faster ,unless they virtualise the web server as our sys. admin. often do.

Thank for your interest.

gus-massa · December 8, 2024, 2:38am

My guess is that this happens because you are using a standard reader for vtk files from a package.

My recomendation is to write your own reader. It looks that is has a variable length header in text that you should parse and then a binary blob that is the array. (I think the binary version and no compression is better to read random points..)

Let's assume you need 10x0x10 points arround each 1440 measurement, that's only 1,500,000 flonums to read. Instead of the full cube that has like 1000x1000x1000. Beeng too optimistic, that is a 1000 speedup but real life and disk storage are harder, so I guess a 100 speedup is more realistic. So the time will go from 4 minutes to 2 seconds. Also, a small memory footprint may reduce other problems and make everything faster.

I think it can be implemented in less than a week, so I strongly recomend to try it.

(My recomendation is to load only the 10x10x10 datas and then just drop them and load the next 10x10x10 data. In other scenarios it may be helpful to save them in a hash or something and avoid reading them again later. But in this case the simple solution seams to be fast enough and easier to implement.)

LiberalArtist · December 8, 2024, 3:22am

It looks like, in addition to the legacy VTK ascii and binary formats and an XML format, there is a VTKHDF binary format that the docs say is designed for high I/O performance and random access.

damien_mattei · December 8, 2024, 10:28pm

but your gonna make me works on a sunday

globally your method is a good idea , i haven't yet checked all the details but it make drop off almost all the code made by the thesard/post-doctorate on the read of file and interpolation and i will not take this decision myself solely anyway.

damien_mattei · December 8, 2024, 10:31pm

and xml is the standart for VO (virtual observatory) instead of vtk being proprietary, for a web app xml is better. Apparently the interpolation code had not been written taking this in account. As i already wrote it ,it is not me that wrote the Python read/interpolation code.

damien_mattei · April 29, 2025, 3:15pm

just for info, as it has little related with Racket, but the slowless was due to 2 things:

unecessary deep copy of object in Python code (i did not wrote this code ,so i had to find all the caveats inside), but this could happen in many language, C++,etc ,perheaps less in functional language , the Python code was:

rv=np.array([vect_x, vect_y, vect_z])

is taking 88 sec , replacing with:

 rv=[vect_x, vect_y, vect_z]

or:

return vect_x, vect_y, vect_z

are immediate, no latency, and the list or tuple can still be indexed as arrays.So no further code to modify.

creating rv
vect_x.size= 566231040
returning vect np.array list
DEBUG: appel de list 0.009212017059326172
DEBUG: liste créé 3.0994415283203125e-06
returning vect list

slow VTK reader from the official package , a bit incredible but rewriting my own reader ,reading still all the cube of data as the code below, accelerate the reading by a factor of 30:

if nibbles_away and math == 'VECTORS' :
      
      data_file_bin = open(os.path.join(self.address,tar_file+'.vtk'),'rb') # binary mode
      print("DEBUG: mod_from : data_file_bin =",data_file_bin); 

      # skip the text lines
      data_file_bin.readline() # comments vtk Datafile version ....
      data_file_bin.readline() # textual description
      data_format_bin = data_file_bin.readline() # format
      print('DEBUG: mod_from : data_format_bin=',data_format_bin)
      data_structure_bin = data_file_bin.readline() # structure
      print('DEBUG: mod_from : data_structure_bin=',data_structure_bin)
      data_dimension_bin = data_file_bin.readline() # lit les dimensions
      print('DEBUG: mod_from : data_dimension_bin=',data_dimension_bin)
      data_origin_bin=data_file_bin.readline() # ici ORIGIN
      print('DEBUG: mod_from : data_origin_bin=',data_origin_bin)
      data_spacing_bin=data_file_bin.readline() # SPACING ...
      print('DEBUG: mod_from : data_spacing_bin=',data_spacing_bin)
      data_product_bin=data_file_bin.readline()  #NB here you have the nx*ny*nz preduct
      math_physic_comp_line_bin=data_file_bin.readline() # VECTORS ... float
      print('DEBUG: mod_from : math_physic_comp_line_bin=',math_physic_comp_line_bin)



      # reading the text data using numpy.fromfile()
      dt = np.dtype('>f4') # force big-endian on 64bits floats
      tA = time.time()
      
      vect = np.fromfile(data_file_bin, dtype=dt) # lecture données binaires
      
      tB = time.time()
      
      print('DEBUG: Time nibbles away mode:',tB-tA)

      print("Binary Data size:", vect.size)
      print("Binary Data shape:", vect.shape)
      print("Binary Data:", vect)

      vect=vect.reshape(size_cube,size_element)
      print("Binary Data reshape size:", vect.size) # taille d'un cube
      print("Binary Data reshape shape:", vect.shape) # 3 cubes pour par exemple Bx,By,Bz, les cubes contiennent la valeur Bx (ou By ou Bz) à chaque Point3D du cube.
      print("Binary Data reshape:", vect)

      data_file_bin.close()

DEBUG: Time nibbles away mode: 2.7550032138824463

simply reading the VTK file in binary mode with numpy and fromfile was done in less than 3 seconds instead of 98 secondes for the official library.... but had to fix endianess to make it works and a few other tricks...

but there is still a possible solution of partial reading data in Scheme or Python to save the last 2 or 3 seconds but this is another story.....

(sorry for all this Python code above)

Damien

Topic		Replies	Views
A real code example in infix/prefix Scheme+ for Racket Show & Tell infix	0	56	January 14, 2025
A micro-benchmark General	68	1354	October 13, 2023
Racket set performance General performance	11	558	December 27, 2021
What would it take to write an independent Racket interpreter? Questions & Answers question	9	1360	May 5, 2022
Eval in a function General	21	812	May 3, 2023

Is defining a function inside a loop without time penalty?

Related topics