Html generating: xml library reinterpreting &

Hello,

just a little technical problem , i want to use greek letter for physic size and i'm using
(require xml)

i want to use ρ , instead of rho so i do:

(regexp-replace #rx"rho" physic "\\ρ")

but now i solved the problem of & with \\& in the regular expression ,then, some layer transform & in & in the html page:

<html>
  <style type="text/css">
    table, th, td { border:1px solid black; }
  </style>
  <head>
    <title>
      Plot
    </title>
  </head>
  <body>
    <h1>
      BepiColombo
    </h1>
    <p>
      <center>
        <br />
        <table>
          <tr>
            <th>
              rhoe (&amp;rho;e)
            </th>
          </tr>

my code is :

      `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   (th ,(string-append physic " (" physic-greek4html ")")))

where physic-greek4html is (regexp-replace #rx"rho" physic "\\&rho;")

how can i prevent & to be reinterpreted by the xml library of Racket?

i want only &rho; in the html source code

Regards,

Damien

Hi, @damien_mattei!

I feel a bit ashamed of this, because I caved and asked Gippity what the deal was, but apparently this works:

#lang racket/base

(require xml)

(define physic "rho")
(define physic-greek4html (entity #f #f 'rho))

(define html-code
  `(html
    (head (title "Plot"))
    (body
     (h1 "BepiColombo")
     (p
      (center
       (table
        (tr
         (th ,physic " (",(xml->xexpr physic-greek4html)")"))))))))

(display (xexpr->string html-code))
<html>

<head>
    <title>Plot</title>
</head>

<body>
    <h1>BepiColombo</h1>
    <p>
        <center>
            <table>
                <tr>
                    <th>rho (&rho;)</th>
                </tr>
            </table>
        </center>
    </p>
</body>

</html>

So, I assume that because the object is now an entity, it is not being interpreted as an escapable string anymore and inserted as-is (entities are used as substitutions?). But that is about the extent of my understanding here.


Edit: lol, so you could also just write: 'rho and it would be the same. Apologies for the misdirect.

(define html-code
  `(html
    (head (title "Plot"))
    (body
     (h1 "BepiColombo")
     (p
      (center
       (table
        (tr
         (th ,physic " (",'rho ")"))))))))

Hello @bakgatviooldoos ,

i was going to post-reply my solution when i see your

i'm afraid to see Le Chat ( :cat: we have in France an IA called like that ...) better than us ... hope only for technical things it is better to search the web with our own past solutions... (for algorithm i tested it and find it dumb....)

but yes your solution works.

Probably i'm misunderstanding between a lot in the xml library: xexpr,xml, entity.... not always clear for me.

But i found those working:

(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) (write-xexpr `(html (th ,(regexp-replace #rx"rho" physic "\\&rho;")))))
<html><th>&rho;e</th></html>

and this:

(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) (xexpr->string `(html (th ,(regexp-replace #rx"rho" physic "\\&rho;")))))
"<html><th>&rho;e</th></html>"

the only big problem with this is that my page has now no indentation in my html code page.

Capture d’écran 2025-02-20 à 16.17.38

;; create output HTML file
	(define html-out (open-output-file #:exists 'truncate html-page-path))

	(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) ; escape table header from the rewrite of ampersand for example
		      ;;(display-xml/content (xexpr->xml html-sexpr)
		      (write-xexpr html-sexpr
		      ;;#:indentation 'classic
		      html-out))

	(close-output-port html-out)

oh i understand now the caveat with the Gipity solution : it force me to almost code the page by hand ,changing everywhere i need the physic size by, hand, rho many times, theta many times and so on....

i prefer the solution with regex-replace and parametrize even if it is not elegant to change be applied to all (table header)

For now we are still a bit better than AI :slightly_smiling_face:, but i would like to have a solution simpler, i never succeed too in using a method that allow classic indentation too.

yes i find this solution too, usefull for static data but perheaps i think constructiong a complex solution ,splitting and parsing the physic size and generating a list L of text and 'rho etc and unquote splicing the list in the xexpr at the good place... but complex.....

I'm glad you came right. Perhaps if you display the xexpr after converting to XML first, using the utility procedure, display-xml/content?

(with-output-to-file "test.html"
  #:exists 'truncate/replace
  (lambda ()
    (display-xml/content
     (xexpr->xml html-code)
     #:indentation 'scan)))

This produces:


<html>
  <head>
    <title>Plot</title>
  </head>
  <body>
    <h1>BepiColombo</h1>
    <p>
      <center>
        <table>
          <tr>
            <th>rho (&rho;)</th>
          </tr>
        </table>
      </center>
    </p>
  </body>
</html>

Plenty of people better than I am, already :wink: Jokes aside, it's probably like my mom used to say about listening to "that little voice": as long as you don't ignore it too much, you'll know when you're being naughty and when it's okay err.

I can't remember if I had this thought, or if I appropriated it somewhere--likely the latter--but the LLMs seem to be a form of "tree of the knowledge of good and evil". It's not what it gives you (haha, you were naked all along), but what it takes away; which I think is a pretty neat analogy.

Anyways, no need to lose any chatoyance yet, ey.

1 Like

2 comments (I skimmed the thread).

  1. Quasiquoting doesn't require unquoting to quote.
`(p ,'rho)

is equivalent to

`(p rho)

and avoids the ,' sigil that might be confusing.

  1. Racket's xexpr facilities do a lot of work to help you not hit footguns with injection. If you really need to bypass them, the most reliable way I've found is cdata, but this should be considered a hack.
2 Likes

Shameless plug, you may find my HTML5 Printer package useful. See also the section in the docs for that package, Comparing with included Racket functions — in particular, display-xml/content works most of the time, but its indentation may introduce extra whitespace in rendered HTML, and it doesn't wrap long lines.

2 Likes

it's interesting because Gipitty got the good idea at last and i'm curious how you @bakgatviooldoos formulated it the question?

As long as i tried to modify the string myself the & char was escaped by the XML parser in the form of &amp; , so the good idea was to use 'rho instead of &rho; in the string. This with xmlor html-printer (@joeld )

I had to implement what i was saying:

#! /usr/bin/env racket
#lang reader SRFI-105

...

	(require Scheme+)
	(require xml
		 (except-in 2htdp/batch-io xexpr?)) ; for: read-lines

      (require html-printer)


;; convert physical size in latin string to greek characters

	;; > (between '(1 2 3) 'A)
	;; '(1 A 2 A 3)
	;; > (between '(1 2) 'A)
	;; '(1 A 2)
	;; > (between '(1) 'A)
	;; '(1)
	(define (between L elem)
	  (if (null? (cdr L))
	      L
	      (cons (car L)
		    (cons elem
			  (between (cdr L) elem)))))

	(define latin-rho "rho")
	(define sp (string-split physic latin-rho #:trim? #f)) ;; example:  (string-split "rhoe" "rho" #:trim? #f)  --->  '("" "e")
	(define xexpr-size (between sp 'rho))
	(define xexpr-size-clean (remove "" xexpr-size))
	;;{physic-greek <- (regexp-replace #rx"rho" physic "ρ")}
	;;{physic-greek4html <- (regexp-replace #rx"rho" physic "\\&rho;")}
	(display "interpole_fields : physic-greek4html=") (display physic-greek4html) (newline)

....

;; generate HTML page
	{html-sexpr <- `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   ;;(th ,(string-append physic " (" physic-greek4html ")")))
				   (th ,(string-append physic " (" )
				       ,@xexpr-size-clean
				       ")" ))
				  (tr
				   (th ,math))
				  (tr
				   (td ,data-cube-filename))
				  (tr
				   (td ,basename-trajectory-xml))
				  (tr
				   (td ,output-file)))
				 (br)
				 (br)
				 (img ((src ,image-name)))
				 (br)
				 (br)
				 (img ((src ,image-name-distance)))))))}

	;; create output HTML file
	(define html-out (open-output-file #:exists 'truncate html-page-path))

	(display "interpole_fields : html-sexpr=") (display html-sexpr) (newline)

	;; (write-xexpr html-sexpr
	;; 	     html-out)
	
	;;(parameterize ([current-unescaped-tags (cons 'th html-unescaped-tags)]) ; escape table header from the rewrite of ampersand for example
	
	;; (display-xml/content (xexpr->xml html-sexpr)
	;; 		     html-out
	;; 		     #:indentation 'classic)
	
	;;)

	(write (xexpr->html5 html-sexpr)
	       html-out)
	
	(close-output-port html-out)
	


as you can see in code i tested 3 solutions with: write-xexpr, display-xml/content (xexpr->xml html-sexpr) and xexpr->html5

in any cas my xexpr (?) is:

html-sexpr=(html (style ((type text/css)) table, th, td { border:1px solid black; }) (head (title Plot)) (body (h1 BepiColombo) (p (center (br) (table (tr (th rhoe ( rho e ))) (tr (th SCALARS)) (tr (td Data/Dipole3D_rhoe0_6000.vtk)) (tr (td V504EwE.xml)) (tr (td trajectory-near_Mio_rhoe_6000.txt))) (br) (br) (img ((src V504EwE.jpeg))) (br) (br) (img ((src V504EwE-distance.jpeg)))))))

the display of rhois as expected for all but display-xml/content add a space :

Capture d’écran 2025-02-21 à 18.56.36

not is it with others:

Capture d’écran 2025-02-21 à 18.57.44

and only indentation in html code is possible with display-xml/content

i was not able to get a correct web page with xexpr->html5 :

i did not know what happened ....

long the way is to write html page in Racket :sweat_smile:

will use the one liner page html code version (nobody will check it ) that display better:

Very plainly, as you'll see. You'll also note it gave a slightly off answer, but I could infer what it meant, so this feels nit-picky to mention.

yes ,chatgpt is great to find answers in the good direction , but seems fuzzy about precision,need little corrections (about exact syntax for example).

thank you , i will try the solution also with CDATA that @benknoble mentioned and which is explained by GPT

but it is sometimes more easy for little special portion of HTML code to generate it with strings,the hard point was to find doc for CDATA :

,(make-cdata #f #f
				  (string-append physic " (" physic-greek4html-sup ")"))

https://docs.racket-lang.org/xml/index.html#(def._((lib._xml%2Fmain..rkt)._make-cdata))

i did not understand why i had to add #f #f as parameters, doc is not clear on that, anyway i personally have a lot of problem to understand the Racket docs the way doc are presented, i remembered C doc under Linux in man pages being more understandable.

Again chatGPT was very fuzzy and wrong in how to use CDATA:

CDATA made easy the transformation of e- in the physic value:

(define latin-rho "rho")
(define sp (string-split physic latin-rho #:trim? #f)) ;; example:  (string-split "rhoe" "rho" #:trim? #f)  --->  '("" "e")
(define xexpr-size (between sp 'rho))
(define xexpr-size-clean (remove "" xexpr-size))
{physic-greek <- (regexp-replace #rx"rho" physic "ρ")}
{physic-greek-sup <- (regexp-replace #rx"e" physic-greek "e⁻")}
{physic-greek4html <- (regexp-replace #rx"rho" physic "\\&rho;")}
{physic-greek4html-sup <- (regexp-replace #rx"e" physic-greek4html "e<sup>-</sup>")}

even if it could have be possible to create the sup html tag with code (using a between procedure enhanced) replacing in a string was easier.

For the extra space i find the solution chainging indentation from 'classic to 'scan but i think it is a bug because indentation style should not have added extra space.

(display-xml/content (xexpr->xml html-sexpr)
			     html-out
			     #:indentation 'scan
			     ;;'classic
			     )

and now the code is well indented:

From the last part of the documentation for display-xml/content:

Be warned that even 'scan does not handle HTML with 100% accuracy. The following example will be incorrectly rendered as "no body" instead of "nobody":

Examples:

(define html-data '(span (i "no") (b "body")))
(show 'scan html-data)
<span>
  <i>no</i>
  <b>body</b>
</span>

This is why xexpr->html5 is better for HTML strings.

1 Like

If you could give a minimal/simple example of the problem that runs in #lang racket and shows the input X-expression and the output HTML string, I would be able to determine if it was a problem with my library. A screenshot of the browser output isn’t enough to go on. I made several attempts to reconstruct something using your supplied code but there were too many missing bindings.

there seems perheaps just all those \n causing problem....

i have not now the development host where all the code is but i succeeded with the info in discourse in reproducing the generated string of html on my current host:

> (define xexpr `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   ;;(th ,(string-append physic " (" physic-greek4html ")")))
				   (th "blabla" ))
				  (tr
				   (th "SCALARS"))
				  (tr
				   (td "data-cube-filename"))
				  (tr
				   (td "basename-trajectory-xml"))
				  (tr
				   (td "output-file")))
				 (br)
				 (br)
				 (img ((src "image-name")))
				 (br)
				 (br)
				 (img ((src "image-name-distance"))))))))


(define xexpr
  `(html
    (style ((type "text/css")) "table, th, td { border:1px solid black; }")
    (head (title "Plot"))
    (body
     (h1 "BepiColombo")
     (p
      (center
       (br)
       (table
        (tr (th "blabla"))
        (tr (th "SCALARS"))
        (tr (td "data-cube-filename"))
        (tr (td "basename-trajectory-xml"))
        (tr (td "output-file")))
       (br)
       (br)
       (img ((src "image-name")))
       (br)
       (br)
       (img ((src "image-name-distance"))))))))


#<eof>
> (require html-printer)


(require html-printer)
standard-module-name-resolver: collection not found
  for module path: html-printer
  collection: "html-printer"
  in collection directories:
   /home/mattei/.local/share/racket/8.14/collects
   /home/mattei/racket/collects/
   ... [176 additional linked and package directories] in: html-printer
  packages that provide the missing module: .
    html-printer  .
> (require html-printer)


(require html-printer)


#<eof>
> (write (xexpr->html5 xexpr))


(write (xexpr->html5 xexpr))
"<!DOCTYPE html>\n<html>\n  <style type=\"text/css\">table, th, td { border:1px solid black; }</style>\n  <head>\n    <title>Plot</title>\n  </head>\n  <body>\n    <h1>BepiColombo</h1>\n    <p><center><br>\n    <table>\n      <tr>\n        <th>blabla</th>\n      </tr>\n      <tr>\n        <th>SCALARS</th>\n      </tr>\n      <tr>\n        <td>data-cube-filename</td>\n      </tr>\n      <tr>\n        <td>basename-trajectory-xml</td>\n      </tr>\n      <tr>\n        <td>output-file</td>\n      </tr>\n    </table>\n<br>\n    <br>\n    <img src=\"image-name\"><br>\n    <br>\n    <img src=\"image-name-distance\"></center></p>\n  </body>\n</html>\n"

as i'm using my #lang SRFI-105 parser, i make it again below with #lang racket but i suppose it will be the same :

#lang racket
(require html-printer)
(define xexpr `(html
			 (style ((type "text/css")) "table, th, td { border:1px solid black; }")
			 (head (title "Plot"))
			 (body (h1 "BepiColombo")
			       (p
				(center
				 (br)
				 (table
				  (tr
				   ;;(th ,(string-append physic " (" physic-greek4html ")")))
				   (th "blabla" ))
				  (tr
				   (th "SCALARS"))
				  (tr
				   (td "data-cube-filename"))
				  (tr
				   (td "basename-trajectory-xml"))
				  (tr
				   (td "output-file")))
				 (br)
				 (br)
				 (img ((src "image-name")))
				 (br)
				 (br)
				 (img ((src "image-name-distance"))))))))

(write (xexpr->html5 xexpr))

If it's just that you don't want literal \n in your output string, you should use display instead of write in the last expression. write prints the value the way it would be written in Racket code (that is, “in such a way that instances of core datatypes can be read back in [using racket’s default reader]” per the docs).

ok :sweat_smile:

(display (xexpr->html5 xexpr))
<!DOCTYPE html>
<html>
  <style type="text/css">table, th, td { border:1px solid black; }</style>
  <head>
    <title>Plot</title>
  </head>
  <body>
    <h1>BepiColombo</h1>
    <p><center><br>
    <table>
      <tr>
        <th>blabla</th>
      </tr>
      <tr>
        <th>SCALARS</th>
      </tr>
      <tr>
        <td>data-cube-filename</td>
      </tr>
      <tr>
        <td>basename-trajectory-xml</td>
      </tr>
      <tr>
        <td>output-file</td>
      </tr>
    </table>
<br>
    <br>
    <img src="image-name"><br>
    <br>
    <img src="image-name-distance"></center></p>
  </body>
</html>
> 

i will test that in real life tomorrow...
thank

I've pushed a fix for that indent issue with that <br> tag following </table>.

1 Like

ah, i did not have noticed the mis-indent <br>

with display this was ok now: