Jump to content



Photo

Chapter 15: Message Board, The General Subject Of Unicode Safe Data

mysql charset utf8 encoding

  • Please log in to reply
11 replies to this topic

#1 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 24 July 2012 - 1:05 PM

Mentioned in the book in especially Chapter 14: Making Universal Sites and Chapter 15: Message board example, are recommendations for processing unicode safe data.

We learn that utf-8 supports a wide list of languages. There was so much information to digest, I could not recall if every action is required when building a universal/multilingual site. Are all of these set by default? Particularly the database character set.

My mySQL is already set as (actually copied this straight from phpMyAdmin) utf8_unicode_ci Unicode (multilingual), case-insensitive

Would I then have to establish a charset and collation for the database?

Are these actions needed each time, here is a list I gathered up while reading.

Added to inbetween the head tags
<meta http-equiv="content-type" content="text/html; charset=utf-8"> -> Page 416-417

Must be the first line before any HTML, must be a php script page
header('Content-Type: text/html; charset=UTF-8'); -> Page 452 (and more information in Chapter 14)

Mysql Client
CREATE DATABASE forum2 CHARACTER SET utf8; -> Page 444

mysql_connection
mysqli_set_charset($dbc, 'utf8'); or mysqli_query($dbc, 'SET NAMES utf8'); -> Page 450



Any one with some insight on this great appreciated with any comments.

-Mark
  • 0

#2 HartleySan

HartleySan

    Advanced Member

  • Members
  • PipPipPip
  • 2,892 posts
  • LocationColumbus, OH USA

Posted 24 July 2012 - 11:17 PM

I've made a number of Japanese DBs, and here's what I've observed:
1) MySQL DBs will not have their charset/collation set to UTF-8 by default. Whenever I create a new DB, I always manually set the collation to utf8_general_ci.
2) If the DB collation is set to utf8_general_ci, then all text fields in the tables in the DB should automatically default to the same collation. If this isn't the case, just manually override any fields that you need UTF-8 for.
3) In order for a PHP script to be able to handle UTF-8 data sent to it, you must either use the HTML <meta charset="UTF-8"> element or execute the header('Content-Type: text/html; charset=UTF-8') command. The alternative is to edit your php.ini file so that the default charset is UTF-8.
4) In order to get the DB connection to properly send UTF-8 data back and forth, you need to execute the mysqli_set_charset($dbc, 'utf8') command. Be careful, as the 'utf8' argument is case sensitive. I believe it's also possible to set the default charset in the php.ini file, but I'm not sure about that.

If you do those four things, then all of your non-English/multilingual scripts/DBs should work fine.

Edit: It was kinda vague in my original post, but I should differentiate between charset and collation. For a clear explanation, please see the following:
http://stackoverflow...on-mean-exactly
  • 2

#3 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 25 July 2012 - 12:17 PM

Many thanks for your observation notes Hartley.

I had to read that great example twice per stackoverflow - but i finally go it.

I've made a number of Japanese DBs, and here's what I've observed:
3) In order for a PHP script to be able to handle UTF-8 data sent to it, you must either use the HTML <meta charset="UTF-8"> element or execute the header('Content-Type: text/html; charset=UTF-


Page 452 - 453, Or using both is fine I guess.

Either way, this satisfies my question.
  • 0

#4 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 25 July 2012 - 1:17 PM

Hartley on page 448, there are instructions for inserting languages.

How do we add Francais, Greek, Portuguese, an Japanese to the table? It's not taking them because they are in a different language. I copy pasted them off the web, don't know if that's the reason why.

I have taken these steps in my mysql client (shell).

(1) mysql > CHARSET UTF8;

(2) have altered all my tables to utf8 encoding


INSERT INTO languages (lang, lang_eng) VALUES
('English', 'English'),
('Português', 'Portuguese'),
('Français', 'French'),
('Norsk', 'Norwegian'),
('Romanian', 'Romanian'),
('ελληνικά', 'Greek'),
('Deutsch', 'German'),
('Srpski', 'Serbian'),
('日本国', Japanese),
('Nederlands', 'Dutch')


Thanks,
Mark
  • 0

#5 HartleySan

HartleySan

    Advanced Member

  • Members
  • PipPipPip
  • 2,892 posts
  • LocationColumbus, OH USA

Posted 25 July 2012 - 8:15 PM

Are you trying to insert directly from the MySQL client/phpMyAdmin, or are you trying to insert from a PHP script?
If it's the latter, what kind of code are you using for your PHP script?
  • 0

#6 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 25 July 2012 - 8:42 PM

I'm actually using the mysql client through the terminal command line.

like so:

mysql > INSERT INTO languages (lang, lang_eng) VALUES
('English', 'English'),
('Português', 'Portuguese'),
('Français', 'French'),
('Norsk', 'Norwegian'),
('Romanian', 'Romanian'),
('ελληνικά', 'Greek'),
('Deutsch', 'German'),
('Srpski', 'Serbian'),
('日本国', Japanese),
('Nederlands', 'Dutch')

But when I get to the first instance of a different encoding, it doesn't paste correctly.
  • 0

#7 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 25 July 2012 - 8:47 PM

Posted Image

It just refuses to take the different language encodings.
  • 0

#8 HartleySan

HartleySan

    Advanced Member

  • Members
  • PipPipPip
  • 2,892 posts
  • LocationColumbus, OH USA

Posted 25 July 2012 - 10:03 PM

I never use the MySQL client, so it could possibly be a display issue.
Although, more than likely, your table is not properly set up.
Could you please show the table structure?
Thanks.
  • 0

#9 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 26 July 2012 - 12:12 AM

Here I've taken some screenshots of the database and table. It is strange that Larry has an example with the mysql client using the terminal. I am not sure how he go the accents to work or if special keys were required (or a copy and paste?). It shows on page 448 in plain site that he used this method.

I hope the tables help, I am trying to locate that the tables are in fact using UTF8, but couldn't find it. Thanks.

Posted Image

Posted Image
  • 0

#10 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 26 July 2012 - 12:45 AM

Hey Hartley, well i resorted to just inputting the languages straight into mysqlPHPadmin by copy pasting portuguese. This worked fine. I also use a command to list the columns in the database, and the accented portuguese showed fine. Not sure how Larry did it, it might have to do with a locale configuration script within ssh.

See below:

Posted Image

Posted Image
  • 0

#11 HartleySan

HartleySan

    Advanced Member

  • Members
  • PipPipPip
  • 2,892 posts
  • LocationColumbus, OH USA

Posted 26 July 2012 - 1:28 AM

Glad you got it working. Sorry I couldn't comment more on the MySQL client, but I always use phpMyAdmin, and have never had a problem with it.
Glad you resolved the issue though.

Good luck with the rest of your project.
  • 0

#12 markifornia

markifornia

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationSan Diego, California

Posted 26 July 2012 - 11:05 AM

thanks hartley, you're input is always useful.
  • 0