Redis client library for Tcl: View Ticket
Ticket Hash: 70c08b5b5dda63f69765741de5cb32b6ebb59c3a
Title: Unicode characters break RESP encoding
Status: Fixed Type: Code_Defect
Severity: Important Priority: Immediate
Subsystem: Resolution: Fixed
Last Modified: 2024-05-22 11:51:48
Version Found In: 0.4.0
User Comments:
anonymous added on 2023-12-29 08:10:31:

Script to reproduce:

package require retcl
retcl create redis

foreach ghost [list wooo \U0001F47B] {
    puts $ghost
    redis -sync set ghost $ghost ex 600
    puts [redis -sync get ghost]
}

Running this gives a crash:

colin@deb2:~/tcl$ ./redis_bug 
wooo
wooo
👻
Disconnected
    while executing
"{*}$errorCallback $msg"
    (class "::retcl" method "Error" line 2)
    invoked from within
"my Error {Disconnected}"
    (class "::retcl" method "result" line 40)
    invoked from within
"my result $cmdId"
    (class "::retcl" method "unknown" line 44)
    invoked from within
"redis -sync get ghost"
    ("foreach" body line 4)
    invoked from within
"foreach ghost [list wooo \U0001F47B] {
    puts $ghost
    redis -sync set ghost $ghost ex 600
    puts [redis -sync get ghost]
}"

When trying to track down this problem, sometimes I was seeing an error of the form:

Protocol error: expected '$', got 'e'

This led me to https://stackoverflow.com/questions/72789950/error-with-redis-protocol-resp-during-bulk-load-when-data-contains-utf-8-chara which reports a similar problem though not involving Tcl.

I'm not sure what the best fix would be for this. For now I have worked around it in my application by making sure I don't try to store characters outside the 8-bit range. I think it does need a fix in Retcl though.


gahr added on 2024-04-22 12:37:54:

As discussed in the Tcl Chatroom, I think it's best if any conversion is done client side.


gahr added on 2024-04-22 12:40:23:

I've added a test in f90952f00b


anonymous (claiming to be Colin) added on 2024-04-23 06:49:43:

Perhaps a note should be added to the documentation to specify that input strings must consist only of characters in the 8-bit range, i.e. byte strings.


gahr added on 2024-05-22 11:51:48:

I've added a section to the README, see https://code.ptrcrt.ch/retcl/doc/trunk/docs/index.html#_encoding.