Python string indexes

Hello,

I learned some Python and I discovered that it uses negative indexes when dealing with strings.

Something like this:

(substring “racket.discourse.group” 7 -6)
=> “discourse”

(substring “racket.discourse.group” -5)
=> “group”

I think it’s nice. What do you think?

Matteo

Quite a few languages support negative offsets like that, but not Racket or Scheme (Not even in any of the string SRFIs, I believe). Easy enough to implement, though. Example module:

#lang racket/base

(require racket/contract (only-in racket/base [substring rkt:substring]))
(provide
 (contract-out
    (substring (->* (string?) (exact-integer? exact-integer?) string?))))

(define (normalize-ranges s start end)
  (values
   (if (< start 0)
       (+ (string-length s) start)
       start)
   (if (< end 0)
       (+ (string-length s) end)
       end)))

(define (substring s [start 0] [end (string-length s)])
  (let-values ([(start end) (normalize-ranges s start end)])
    (rkt:substring s start end)))

(module+ test
  (require rackunit)
  (check-equal? (substring "racket.discourse.group" 7 -6) "discourse")
  (check-equal? (substring "racket.discourse.group" -5) "group"))

You have smart quotes in your code snippets, btw, which will throw off anyone trying to copy and paste them. Like me for the test cases....

2 Likes

I like the general idea, but I don't like the ambiguity of 0.

#lang racket
(require magic/compatibility/phyton)

(define (test s n)
  (python-substring s n -n))

(define S "Hello, Word!")

(test S 2) ; ==> "llo, Wor"
(test S 1) ; ==> "ello, Word"
(test S 0) ; ==> ""            <-- !!! :(
1 Like

To solve this problem normalize-ranges should check if end is less than 1 and not less than 0.

You could if you wanted to, of course, but I deliberately didn't do any checking for things like out of bounds indexes. Left that to the original substring, which is going to do it anyways.

#lang racket
(require magic/compatibility/phyton/other)

(define (test s n)
  (python-substring/other s 0 n))

(define S "Hello, Word!")

(test S 2) ; ==> "He"
(test S 1) ; ==> "H"
(test S 0) ; ==> "Hello, Word!"            <-- !!! :(

I really like negative index, but the corer cases for 0 annoy me.

1 Like

I really like negative index, but the corer cases for 0 annoy me.

Me too, probably that’s why racket doesn’t have negative indexes.

This discussion reminds me of the blog post

http://scheme.dk/blog/2007/04/writing-spelling-corrector-in-plt.html

in which I ported Norvig's spelling checker from Python to Scheme.

In the blog post I use a little utility concat you might find useful.

The list comprehensions list-ec etc are from srfi 42. Today I would have used for/list and
friends.

I like the general idea, but I don't like the ambiguity of 0.

No ambiguity. 0 is not negative and points to the first char in string.

Python-style also allows second index to be very large

>>> "Hello"[0:1000]
'Hello'

So correct implementation should be

(define (normalize-ranges s start end)
  (define l (string-length s))
  (values
    (if (< start 0) (+ l start) (min l start))
    (if (< end 0) (+ l end) (min l end))))

Still not right. :wink:

$ racket
Welcome to Racket v8.8 [cs].
> (define (normalize-ranges s start end)  ; copied from Discourse thread
    (define l (string-length s))
    (values
      (if (< start 0) (+ l start) (min l start))
      (if (< end 0) (+ l end) (min l end))))
> (define s "Hello")
> (normalize-ranges s -100 -90)
-95
-85
> (normalize-ranges s -100 3)
-95
3
> (normalize-ranges s 5 3)
substring: ending index is smaller than starting index
  ending index: 3
  starting index: 5
  valid range: [0, 5]
  string: "Hello"
 [,bt for context]

Compare:

$ python
Python 3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 13.1.1 20230511 (Red Hat 13.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "Hello"
>>> s[-100:-90]
''
>>> s[-100:3]
'Hel'
>>> s[5:3]
''

See Common sequence operations in the Python documentation.

(define (normalize-ranges s start end)  ; copied from Discourse thread
    (define l (string-length s))
    (define (over ss) (if (< ss 0) (max 0 (+ l ss)) (min l ss)))
    (define start* (over start))
    (define end* (max start* (over end)))
    (values start* end*))