tmbsprint: improve printing output when it has invalid UTF data - sacc - sacc(omys), simple console gopher client (mirror) HTML git clone https://git.parazyd.org/sacc DIR Log DIR Files DIR Refs DIR LICENSE --- DIR commit edab539b23594219bbfc83729822da917a18a243 DIR parent c416c8c73d0a33eb8c428b1a9b9eaaffc098ee5b HTML Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Tue, 5 Jan 2021 21:21:03 +0100 mbsprint: improve printing output when it has invalid UTF data Reset the decode state when mbtowc returns -1. The OpenBSD mbtowc(3) man page says: "If a call to mbtowc() resulted in an undefined internal state, mbtowc() must be called with s set to NULL to reset the internal state before it can safely be used again." Print the UTF replacement character (codepoint 0xfffd) for the invalid codepoint or incomplete sequence and continue printing the line (instead of stopping). Remove the 0 return code as it can't happen because we're already checking the string length in the loop. Diffstat: M sacc.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) --- DIR diff --git a/sacc.c b/sacc.c t@@ -110,12 +110,18 @@ mbsprint(const char *s, size_t len) slen = strlen(s); for (i = 0; i < slen; i += rl) { - if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0) - break; + rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4); + if (rl == -1) { + mbtowc(NULL, NULL, 0); /* reset state */ + fputs("\xef\xbf\xbd", stdout); /* replacement character */ + col++; + rl = 1; + continue; + } if ((w = wcwidth(wc)) == -1) continue; if (col + w > len || (col + w == len && s[i + rl])) { - fputs("\xe2\x80\xa6", stdout); + fputs("\xe2\x80\xa6", stdout); /* ellipsis */ col++; break; }