|This article has links to websites or programs not trusted by Scratch or hosted by Wikipedia. Remember to stay safe while using the internet, as we cannot guarantee the safety of other websites.|
A Character, commonly abbreviated as "char", is a computer symbol, letter, or number. A keyboard is an input device that inputs a character when a key is pressed. In Scratch, characters are used in strings, arguments, and any situation in the Scratch editor or the playable project where text is required.
Computers use encoding sets to represent characters. Since computers only understand binary code, characters are identified by certain binary sequences. There are many variations and standards across the globe that have changed throughout history.
Types of Characters
Letters are characters from an alphabet. In English, they consist of lowercase and uppercase characters ranging from the letter "A" to "Z". Combining letters can create words, and combining words can create sentences. "Character" is simply a more universal world that encompasses letters as well as other attributes.
Computers have a wide range of recognizable symbols. Some are present on standard keyboards while others may need inputted via software rather than a hardware device. An example of a symbol is the common pound sign: "#". The pound sign is also known as a "hash tag" on social media websites and is arguably the most common symbol used. The "&" symbol is also common and represents the word "and" with one character.
- Not to be confused with Scratch Emojis.
Emojis are small images and "smileys". They are recognized computer characters and can even be used in project names. Emojis have surged in popularity in the last decade due to their fun nature and easy accessibility on cell phones. To input an emoji into a project title, there are various methods:
- Perform the input on a cell phone from the project page
- Copy-and-paste an emoji from another Internet source
- Use an on-screen keyboard with emoji support on a computer
In Windows 10, the default on-screen keyboard does not support emojis. However, there is a second on-screen keyboard called the "Touch Keyboard" that has emoji support. The Touch Keyboard can be used even without a touch screen; it has support for a traditional computer mouse. To enable it, right-click the task bar and select "Show touch keyboard button". From there, the Touch Keyboard icon will appear in the on the right side of the task bar. On the virtual keyboard, the "smiley" button displays the emoji options.
- Main article: Numbers
Numbers are also symbols, and they are unique symbols that can have mathematical operations performed on them. Single number characters can be combined to form larger or more precise numbers. The basic numbers range from "0" to "9". Decimal numbers often use the "." character to represent a decimal point. While the "." character alone is a symbol and not a number, it can be used with a number.
Some characters are "invisible", as in computers do not display them on a screen. An example of this is the "escape" key. Other examples include the character for the "enter" key, the "tab" key, and even the value "null". Null is not something that has any visual representation but is important in computer programming. In the language C, the "null" character is used to denote the end of a string.
Some computer programs may only allow certain characters to be used in specific circumstances. For instance, Scratch does not allow letter or symbol characters to be typed into a numeric insert. It is up to the programmer to decide what characters are allowed or not. Many websites only allow letters, numbers, and a few symbols to be used for usernames. Passwords often allow a larger range of characters for enhanced security.
- Main article: String
A string is a chain of characters. A phrase, word, or even random jumble of characters can be a string. The communication of ideas typically cannot be done with single characters, so multiple are used in unison. A string can consist of a single character, however. In Scratch, strings are commonly used in lists, blocks such as Say (), encoding and decoding cloud data, and more.
Retrieving a Character from a String
- Main article: Letter () of () (block)
In Scratch, the
letter () of  block is used to retrieve a single character from a string. For instance, if the first letter of "Hello World" is to be obtained, arguments can be entered into the block to form
letter (1) of [Hello World].
A computer does not recognize characters like a human. A human sees a frowning emoji and interprets it as sadness. A human sees numbers and associates mathematics with them. A computer merely a machine that represents characters with standardized formats known as encoding. Basically, characters are all assigned code values. Usually the code values are organized for the programmer's ease. For instance, the letters' codes will be in order alphabetically. Numbers, likewise, will be in a simple order.
The American Standard Code for Information Interchange is an old but still-available encoding standard. Each letter is associated with an ASCII code and also represented by a single byte (8 bits). Originally only 7 bits were used to represent ASCII characters, allowing 128 characters. They were still represented by a single byte, though, since computers work with bytes better than an odd number of bits. Eventually an extended family of ASCII characters came out with the characters 128-255, becoming known as ANSI.
ASCII is a limited character set because of its history. In the past, computers could only handle up to 8 bits, so the ASCII character set was restricted to 127 characters. This predominantly included characters most associated with the English language. The following table provides a snippet of some of the characters in the set:
|Decimal Code||Binary Value||Character|
|1||000 0001||Start of heading|
|2||000 0010||Start of text|
|3||000 0011||End of text|
|4||000 0100||End of transmission|
|9||000 1001||Horizontal Tab|
|10||000 1010||Line Feed|
|11||000 1011||Vertical Tab|
|12||000 1100||Form Feed|
|13||000 1101||Carriage Return|
|14||000 1110||Shift Out|
|15||000 1111||Shift In|
|16||001 0000||Data Link Escape|
|17||001 0001||Device Control 1 (XON)|
|18||001 0010||Device Control 2|
|19||001 0011||Device Control 3 (XOFF)|
|20||001 0100||Device Control 4|
|21||001 0101||Negative Acknowledgement|
|22||001 0110||Synchronous Idle|
|23||001 0111||End of Transmission Block|
|25||001 1001||End of Medium|
|28||001 1100||File Separator|
|29||001 1101||Group Separator|
|30||001 1110||Record Separator|
|31||001 1111||Unit Separator|
ANSI is an extension of the ASCII encoding, doubling the amount of characters. It contains characters ranging from 0-255. It differs from ASCII notably by using 8 bits instead of 7 bits to represent a single character. In the present day, though, this difference is insignificant since ASCII characters are essentially stored as 8-bit values with the first bit always set to "0". Likewise, values that ANSI adds onto the ASCII character set has the first bit set to "1".
In different countries, different ANSI characters will represent different values. The encoding system itself (ANSI) uses the same logic, but which codes are associated with which symbols varies. A mapping standard is a method of defining the codes for the desired characters. ISO-8859 and its variants are the most common mapping schemes for Western languages in ANSI.
Inputting Characters off the Keyboard
Keyboards only have a limited amount of characters. If, for instance, one wants to enter the "°" symbol, the "alt" key can be held down while "0176" is punched in on the right-hand number pad of the keyboard. "176" is the respective code for the degree sign in ANSI. This is a functionality built into most keyboards.
UTF-8 is a more modern standard that encompassing over a million characters without necessarily requiring multiple bytes per character. The standard can be used globally, allowing Chinese characters to be used in the same text as Spanish characters. UTF-8 encodes into certain bits information on the length of the sequence of bits to represent the character. For example, if a certain character had a very long code to represent itself, there would be a few bits acting as as a "flag" to alert the computer that it's a longer character. The computer would then take the next byte into account for one single character.
Some characters are represented by less bytes than others in the encoding. Thus, this allows files to be smaller in size than an encoding that treats every character with the same amount of bytes. The USC4 character encoding represents all characters with 4 bytes. While some characters in UTF-8 may be represented by 4 bytes, many are only represented by 1 or 2. The following chart shows the amount of bytes required for a range of characters:
|Bytes Per Character|
|Min Character Code||Max Character Code||Bytes|
Because of this setup, all the original ASCII characters (0-127) are still only 1 byte in UTF-8. The least common characters take on a larger amount of bytes.
UTF-16 and UTF-32 also exist but are less common than UTF-8. UTF-16 uses a minimum of 16 bits or 2 bytes for every character. One would assume this would make files larger than UTF-8, but some characters in UTF-8 character that are represented by 3 bytes may be represented by 2 bytes in UTF-16. A character represented by 16 bits in UTF-8 actually takes up 3 bytes because some of the bits are used to signal that multiple bytes are necessary. In UTF-16, a 16 bit character can be represented by 2 bytes.
Unicode is a standardized character set that can store over a million characters. UTF-8 encodes the Unicode character set. Unicode itself does not specify how to encode its data into binary, it merely is a large database of code values of many characters. Unicode is constantly being updated with new values, as not all have been filled yet. On May 18, 2017, the Emoji 5.0 set of characters was released.
Usage in Scratch
- Main article: Encoding and Decoding Cloud Data
Foreknowledge of computer character encoding standards can be beneficial when developing Scratch projects. Particularly, using cloud variables to store more data than a counter or high score value requires a custom-made encoding system. Cloud variables are only capable of storing numbers, so if text is to be stored, it needs to be translated into numeric codes. This is inline with how computers work, as they translate text into sequences of "1"s and "0"s.
Similar to ASCII encoding, a system can be designed where each character is assigned a code, and the cloud variable contains a sequence of codes. When the data is to be read, it must be decoded by looking up the characters associated with their respective code values. Since cloud variables allow the numbers 0-9, less digits can represent the same range of characters as ASCII, which requires 7 digits (bits) in binary per character.
UTF-8 can also be replicated with cloud variables by using some digits to represent how many other digits are part of the same character before moving onto the next one. Consider that the first number in the cloud variable signifies how many following digits make up the code for the next char. If the cloud variable's encoded data is 3564298, then the first character code is "564" followed by "98". The "3" signifies that there are 3 digits in the first character, and the "2" signifies that there are 2 digits in the second character.
A list can then be used where the index corresponds to the character code. This type of system would be beneficial if a large amount of characters is to be recognized by the project. Restrictions can be set by only allowing certain characters, using the more simple ASCII-based encoding system with a fixed number of digits per character. This, however, may cause issues if someone's username with an "illegal" character is attempted to be encoded into the cloud variable. More complex logic could possibly account for such circumstances.
- ASCII on Wikipedia
- Character on Wikipedia
- Character Encoding on Wikipedia
- Unicode on Wikipedia
- UTF-8 on Wikipedia