Unicode Hurts?
.encode() and .decode()?unidecode module?¯\_(ツ)_/¯
You probably didn't really fix the problem
At best you got lucky, at worst you corrupted data!
What's the root cause?
Human communication is complicated and messy
Communicating via computers only makes this worse
Written langugage uses glyphs
Need to convert analog glyphs to digital
One of the better old encodings
Binary Value => Character
0b0000000 => NUL0b1111111 => DEL0b1000001 => A0b1000011 => B0b1100001 => a0b1100011 => aISO-Latin-1 adds an 8th bit
What about other languages?
Define abstract "Code Points"
Leave representation up to other software
Maximum of 17 * 216 or 1,114,112 Code Points
Version 9.0 (June 2016)
135 Scripts
128,237 Characters (11.5% of total)
| ☃ | |
| Code Point | 2603 |
| UTF-8 | E2 98 83 |
| Name | SNOWMAN |
| Alias | Snowy Weather |
NULL!)
(╯°□°)╯︵ ┻━┻