Html generating: xml library reinterpreting &

Hello,

just a little technical problem , i want to use greek letter for physic size and i'm using
(require xml)

i want to use ρ , instead of rho so i do:

(regexp-replace #rx"rho" physic "\\ρ")

but now i solved the problem of & with \\& in the regular expression ,then, some layer transform & in & in the html page:

<html>
  <style type="text/css">
    table, th, td { border:1px solid black; }
  </style>
  <head>
    <title>
      Plot
    </title>
  </head>
  <body>
    <h1>
      BepiColombo
    </h1>
    <p>
      <center>
        <br />
        <table>
          <tr>
            <th>
              rhoe (&amp;rho;e)
            </th>
          </tr>

my code is :

      `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   (th ,(string-append physic " (" physic-greek4html ")")))

where physic-greek4html is (regexp-replace #rx"rho" physic "\\&rho;")

how can i prevent & to be reinterpreted by the xml library of Racket?

i want only &rho; in the html source code

Regards,

Damien

Hi, @damien_mattei!

I feel a bit ashamed of this, because I caved and asked Gippity what the deal was, but apparently this works:

#lang racket/base

(require xml)

(define physic "rho")
(define physic-greek4html (entity #f #f 'rho))

(define html-code
  `(html
    (head (title "Plot"))
    (body
     (h1 "BepiColombo")
     (p
      (center
       (table
        (tr
         (th ,physic " (",(xml->xexpr physic-greek4html)")"))))))))

(display (xexpr->string html-code))
<html>

<head>
    <title>Plot</title>
</head>

<body>
    <h1>BepiColombo</h1>
    <p>
        <center>
            <table>
                <tr>
                    <th>rho (&rho;)</th>
                </tr>
            </table>
        </center>
    </p>
</body>

</html>

So, I assume that because the object is now an entity, it is not being interpreted as an escapable string anymore and inserted as-is (entities are used as substitutions?). But that is about the extent of my understanding here.


Edit: lol, so you could also just write: 'rho and it would be the same. Apologies for the misdirect.

(define html-code
  `(html
    (head (title "Plot"))
    (body
     (h1 "BepiColombo")
     (p
      (center
       (table
        (tr
         (th ,physic " (",'rho ")"))))))))

Hello @bakgatviooldoos ,

i was going to post-reply my solution when i see your

i'm afraid to see Le Chat ( :cat: we have in France an IA called like that ...) better than us ... hope only for technical things it is better to search the web with our own past solutions... (for algorithm i tested it and find it dumb....)

but yes your solution works.

Probably i'm misunderstanding between a lot in the xml library: xexpr,xml, entity.... not always clear for me.

But i found those working:

(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) (write-xexpr `(html (th ,(regexp-replace #rx"rho" physic "\\&rho;")))))
<html><th>&rho;e</th></html>

and this:

(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) (xexpr->string `(html (th ,(regexp-replace #rx"rho" physic "\\&rho;")))))
"<html><th>&rho;e</th></html>"

the only big problem with this is that my page has now no indentation in my html code page.

Capture d’écran 2025-02-20 à 16.17.38

;; create output HTML file
	(define html-out (open-output-file #:exists 'truncate html-page-path))

	(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) ; escape table header from the rewrite of ampersand for example
		      ;;(display-xml/content (xexpr->xml html-sexpr)
		      (write-xexpr html-sexpr
		      ;;#:indentation 'classic
		      html-out))

	(close-output-port html-out)

oh i understand now the caveat with the Gipity solution : it force me to almost code the page by hand ,changing everywhere i need the physic size by, hand, rho many times, theta many times and so on....

i prefer the solution with regex-replace and parametrize even if it is not elegant to change be applied to all (table header)

For now we are still a bit better than AI :slightly_smiling_face:, but i would like to have a solution simpler, i never succeed too in using a method that allow classic indentation too.

yes i find this solution too, usefull for static data but perheaps i think constructiong a complex solution ,splitting and parsing the physic size and generating a list L of text and 'rho etc and unquote splicing the list in the xexpr at the good place... but complex.....

I'm glad you came right. Perhaps if you display the xexpr after converting to XML first, using the utility procedure, display-xml/content?

(with-output-to-file "test.html"
  #:exists 'truncate/replace
  (lambda ()
    (display-xml/content
     (xexpr->xml html-code)
     #:indentation 'scan)))

This produces:


<html>
  <head>
    <title>Plot</title>
  </head>
  <body>
    <h1>BepiColombo</h1>
    <p>
      <center>
        <table>
          <tr>
            <th>rho (&rho;)</th>
          </tr>
        </table>
      </center>
    </p>
  </body>
</html>

Plenty of people better than I am, already :wink: Jokes aside, it's probably like my mom used to say about listening to "that little voice": as long as you don't ignore it too much, you'll know when you're being naughty and when it's okay err.

I can't remember if I had this thought, or if I appropriated it somewhere--likely the latter--but the LLMs seem to be a form of "tree of the knowledge of good and evil". It's not what it gives you (haha, you were naked all along), but what it takes away; which I think is a pretty neat analogy.

Anyways, no need to lose any chatoyance yet, ey.

1 Like

2 comments (I skimmed the thread).

  1. Quasiquoting doesn't require unquoting to quote.
`(p ,'rho)

is equivalent to

`(p rho)

and avoids the ,' sigil that might be confusing.

  1. Racket's xexpr facilities do a lot of work to help you not hit footguns with injection. If you really need to bypass them, the most reliable way I've found is cdata, but this should be considered a hack.
2 Likes

Shameless plug, you may find my HTML5 Printer package useful. See also the section in the docs for that package, Comparing with included Racket functions — in particular, display-xml/content works most of the time, but its indentation may introduce extra whitespace in rendered HTML, and it doesn't wrap long lines.

1 Like

it's interesting because Gipitty got the good idea at last and i'm curious how you @bakgatviooldoos formulated it the question?

As long as i tried to modify the string myself the & char was escaped by the XML parser in the form of &amp; , so the good idea was to use 'rho instead of &rho; in the string. This with xmlor html-printer (@joeld )

I had to implement what i was saying:

#! /usr/bin/env racket
#lang reader SRFI-105

...

	(require Scheme+)
	(require xml
		 (except-in 2htdp/batch-io xexpr?)) ; for: read-lines

      (require html-printer)


;; convert physical size in latin string to greek characters

	;; > (between '(1 2 3) 'A)
	;; '(1 A 2 A 3)
	;; > (between '(1 2) 'A)
	;; '(1 A 2)
	;; > (between '(1) 'A)
	;; '(1)
	(define (between L elem)
	  (if (null? (cdr L))
	      L
	      (cons (car L)
		    (cons elem
			  (between (cdr L) elem)))))

	(define latin-rho "rho")
	(define sp (string-split physic latin-rho #:trim? #f)) ;; example:  (string-split "rhoe" "rho" #:trim? #f)  --->  '("" "e")
	(define xexpr-size (between sp 'rho))
	(define xexpr-size-clean (remove "" xexpr-size))
	;;{physic-greek <- (regexp-replace #rx"rho" physic "ρ")}
	;;{physic-greek4html <- (regexp-replace #rx"rho" physic "\\&rho;")}
	(display "interpole_fields : physic-greek4html=") (display physic-greek4html) (newline)

....

;; generate HTML page
	{html-sexpr <- `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   ;;(th ,(string-append physic " (" physic-greek4html ")")))
				   (th ,(string-append physic " (" )
				       ,@xexpr-size-clean
				       ")" ))
				  (tr
				   (th ,math))
				  (tr
				   (td ,data-cube-filename))
				  (tr
				   (td ,basename-trajectory-xml))
				  (tr
				   (td ,output-file)))
				 (br)
				 (br)
				 (img ((src ,image-name)))
				 (br)
				 (br)
				 (img ((src ,image-name-distance)))))))}

	;; create output HTML file
	(define html-out (open-output-file #:exists 'truncate html-page-path))

	(display "interpole_fields : html-sexpr=") (display html-sexpr) (newline)

	;; (write-xexpr html-sexpr
	;; 	     html-out)
	
	;;(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) ; escape table header from the rewrite of ampersand for example
	
	;; (display-xml/content (xexpr->xml html-sexpr)
	;; 		     html-out
	;; 		     #:indentation 'classic)
	
	;;)

	(write (xexpr->html5 html-sexpr)
	       html-out)
	
	(close-output-port html-out)
	


as you can see in code i tested 3 solutions with: write-xexpr, display-xml/content (xexpr->xml html-sexpr) and xexpr->html5

in any cas my xexpr (?) is:

html-sexpr=(html (style ((type text/css)) table, th, td { border:1px solid black; }) (head (title Plot)) (body (h1 BepiColombo) (p (center (br) (table (tr (th rhoe ( rho e ))) (tr (th SCALARS)) (tr (td Data/Dipole3D_rhoe0_6000.vtk)) (tr (td V504EwE.xml)) (tr (td trajectory-near_Mio_rhoe_6000.txt))) (br) (br) (img ((src V504EwE.jpeg))) (br) (br) (img ((src V504EwE-distance.jpeg)))))))

the display of rhois as expected for all but display-xml/content add a space :

Capture d’écran 2025-02-21 à 18.56.36

not is it with others:

Capture d’écran 2025-02-21 à 18.57.44

and only indentation in html code is possible with display-xml/content

i was not able to get a correct web page with xexpr->html5 :

i did not know what happened ....

long the way is to write html page in Racket :sweat_smile:

will use the one liner page html code version (nobody will check it ) that display better:

Very plainly, as you'll see. You'll also note it gave a slightly off answer, but I could infer what it meant, so this feels nit-picky to mention.