Unicode and UTF-8 Strings in Client Library
 
This document describes how the client library handles UTF-8 encoded strings. By default all strings in the SILC protocol are UTF-8 encoded. All strings that are sent to server and strings that are received from the server are always UTF-8 encoded. It is application's responsibility to render the strings as well as possible on the user interface.
 
Exception to these strings are messages sent and received in Message Payload, which can include practically any kind of strings with any kind of character encodings, and binary data also. If UTF-8 encoded message is sent or received it is indicated with the SILC_MESSAGE_FLAG_UTF8, and application can render the messages accordingly.
 
Other strings are always UTF-8 encoded and application needs to decode the strings to other character encoding if application does not support UTF-8 rendering on user interface. Also strings application sends to library, such as, nicknames, channel names, server names, host names, topic srings, any command argument, etc. must always be UTF-8 encoded before they are sent to the library. The UTF-8 routines help the application developer to encode and decode UTF-8 strings.
 
The client library does not ever encode or decode strings to or from the current locale. The library always expects that all strings it receives from application are already UTF-8 encoded. The library may validate certain UTF-8 strings and return error if needed. Server may also send errors in command reply if strings are not encoded properly.
 
Nicknames and channel names in SILC are also UTF-8 encoded and can include practically any kind of letters, numbers and punctuation marks. Control characters and other special characters are not allowed in nickname strings, and application never receives such nicknames or channel names from the library.