General Error SQL ERROR on special unicode character

If you have issues or comments about the forums or website, post them here.
Post Reply
User avatar
SVlad
Posts: 305
Joined: Thu Aug 27, 2015 5:43 pm
Location: Saint-Petersburg, Russia

General Error SQL ERROR on special unicode character

Post by SVlad »

Code: Select all

General Error
SQL ERROR [ mysql4 ]

Incorrect string value: '\xF0\x9F\x8F\xAF -...' for column 'post_text' at row 1 [1366]

An SQL error occurred while fetching this page. Please contact the Board Administrator if this problem persists.
Get this error when tied to post message with this character https://www.compart.com/en/unicode/U+1F3EF.

User avatar
Arioch
Site Admin
Posts: 4486
Joined: Sat Mar 05, 2011 4:19 am
Location: San Jose, CA
Contact:

Re: General Error SQL ERROR on special unicode character

Post by Arioch »

Curious. It might be interpreted as some kind of escape code in the SQL engine, or perhaps it can't handle unicode values above a certain threshold. I don't really have any direct way to investigate it, as it's not my server.

EdwardSteed
Posts: 17
Joined: Fri Aug 31, 2018 12:17 am

Re: General Error SQL ERROR on special unicode character

Post by EdwardSteed »

I'll save you some time. This is a MySQL issue.

The column type for post_text is set to `utf8`. So you'd think it stores UTF-8 text, which covers anything Unicode. But MySQL's programmers didn't know how UTF-8 actually works so any UTF-8 code longer than 3 bytes causes it to fail. They screwed up so badly in how they were storing UTF-8 data that they couldn't just fix it, because if it started working correctly it would corrupt all the data people had already unknowingly stored using their bad Unicode implementation.

So what they did is create a second data type, `utfmb4`, which is UTF-8 implemented correctly. So unless you convert your entire database (and hope it doesn't mess up the software using it) you can't store anything that requires a 4 byte UTF-8 code. Fortunately the 5% of the Unicode range you can actually store represents almost all the languages in common use, with the exception of various names and older characters in languages like Chinese, some obscure or fictional languages, and a ton of emoji.

User avatar
Arioch
Site Admin
Posts: 4486
Joined: Sat Mar 05, 2011 4:19 am
Location: San Jose, CA
Contact:

Re: General Error SQL ERROR on special unicode character

Post by Arioch »

Makes sense.

User avatar
Razor One
Moderator
Posts: 562
Joined: Thu May 12, 2011 3:38 pm

Re: General Error SQL ERROR on special unicode character

Post by Razor One »

Since this occurred at the same time as the SQL Error, I'm assuming it's a related problem.

The user dragoongfa appears to have fallen into a memory hole. His profile appears to no longer exist, and all posts made by him have vanished. An Example can be seen here with a thread he started, yet no posts from him.

It's possible that other less prolific users may also have disappeared without note.
Image
SpoilerShow
This is my Mod voice. If you see this in a thread, it means that the time for gentle reminders has passed.

User avatar
Arioch
Site Admin
Posts: 4486
Joined: Sat Mar 05, 2011 4:19 am
Location: San Jose, CA
Contact:

Re: General Error SQL ERROR on special unicode character

Post by Arioch »

Dragoongfa has somehow been removed from the forums database. There is no administrative record of him being manually deleted, so presumably this happened in conjunction with problems with the user and sessions tables after the server maintenance.

I will look into the posts database and try to determine whether his posts are gone or just "lost" (assigned to a user ID that doesn't exist). Does anyone know when he last posted, and in what thread?

[edit] I was able to find that his posts are still in the database, so I reassigned them to his new user id. They are once again appearing in the threads.

novius
Posts: 654
Joined: Tue May 26, 2015 10:33 am

Re: General Error SQL ERROR on special unicode character

Post by novius »

As a side note, this issue with UTF-8 characters having more than two bytes (which is definitely legal and warranted in some cases) was the reason for iOS crashing when someone sent a text message containing a specific unicode character. Said character used three or four bytes, having made the UTF-8 translation in iOS barf and die.

Expect problems like these to be rather widespread since the used code and the fallacy of the software designers thinking "two bytes is enough" is rather common.

Post Reply