Oracle Character set change

A Character set does not define languages it defines a certain range of characters. Any language that uses only the characters known by that character set can then be stored. Even Oracle says that If you change character sets there is a possibility that characters that you currently use are not defined in the new character set or that the current setup is not correctly used and therefore you could lose data when changing the NLS_CHARACTERSET.

Refer the following note for Migrating character set for an Oracle Applications database:

Note 124721.1 (Migrating an Applications Installation to a New Character Set.)

(In case of any issues/doubts it’s better to log an SR for assistance.)

You should always check this by using the Character Set Scanner (Csscan) before making any changes to your character set. Even when using Exp/imp or Expdp/Impdp.

Note 458122.1 Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Note 745809.1 Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Note 444701.1 Csscan output explained

For the majority of customers an Unicode character set (AL32UTF8) is the best choice See Note 333489.1 Choosing a database character set means choosing Unicode.

For non-Unicode charactersets the best choice are xx8MSWIN125x charactersets, even if the database itself runs on an Unix platform. The reason is simply that the majority of the clients are windows based systems, hence the best non-Unicode characterset for a database is a characterset that can store all the characters known by those clients, which means an xx8MSWIN125x characterset:

From 11g onwards the DBCA offers by default a list of "recommended" charactersets to use as NLS_CHARACTERSET which , for the same reason as above, does not included charactersets like WE8ISO8859P1 , note that ALL charactersets are still SUPPORTED and "non recommended" charactersets can be used if needed.

Pre-checks before going for migration

Before starting character set conversion, we need to do following pre-checking against database.

1. Invalid objects
2. Orphaned datapump master tables (10g and up)
3. Objects in the recyclebin (10g and up)
4. Leftover temporary tables using CHAR semantics

Note that, please log on to the database with sysdba to continue the following steps.

“BACKUP OF THE DATABASE IS MUST TO TAKE BEFORE THIS ACTIVITY”

1) Invalid objects

We need to Issue the following sql statement to check invalid objects

SQL> col object_name for a40
SQL> select owner,object_name from dba_objects where status='INVALID';

The above sql statement lists all of schemas which contains INVALID objects. These invalid objects need to be compiled or dropped if they are unused.

SQL> exec utl_recomp.recomp_serial(‘SCHEMA’); (For single Schema)
SQL> @$ORACLE_HOME/rdbms/admin/utlrp.sql;

2) Orphaned datapump master tables (10g and up)

To check it, Issue the following sql statement:

SQL> SELECT o.status, o.object_id, o.object_type, o.owner||'.'||object_name "OWNER.OBJECT" FROM dba_objects o, dba_datapump_jobs j WHERE o.owner=j.owner_name AND o.object_name=j.job_name AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;

If “no rows selected", proceed next step. Otherwise, check

Note: 336014.1 "How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS"

3) Objects in the recyclebin (10g and up)

SQL> SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin order by 1,2;

If there are objects in the recyclebin then perform

SQL> PURGE DBA_RECYCLEBIN;

This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will be seen.

4) Leftover temporary tables using CHAR semantics

SQL> select C.owner ||'.'|| C.table_name ||'.'|| C.column_name ||' ('|| C.data_type ||' '|| C.char_length ||' CHAR)' from all_tab_columns C where C.char_used = 'C' and C.table_name in (select table_name from dba_tables where temporary='Y') and C.data_type in ('VARCHAR2', 'CHAR') order by 1;

If “no rows selected", proceed next step. Otherwise, check

Note: 4157602.8 DBMS_STATS "ORA_TEMP_%_DS_%" temporary tables not cleaned up.

NOTE: - Also make sure that no "binary" (Encrypted) data is stored in character data types (CHAR,VARCHAR2,LONG,CLOB).

Running csscan

$ .oraenv (db_name)
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ csscan FULL=Y FROMCHAR=< current NLS_CHARACTERSET> TOCHAR=<New NLS_CHARACTERSET> LOG=CSSCANOUTPUT CAPTURE=Y ARRAY=1000000 PROCESS=2

This will create 3 files:

csscanoutput.out: A log of the output of csscan
csscanoutput.txt: Database Scan Summary Report
csscanoutput.err: Contains the rowid’s of the Convertible and Lossy rows reported in csscanoutput.txt

It is very important to go through the status of data in the Conversion Summary of the .txt file. The "Conversion summary" in the .txt file is the first important thing to look at. This is where you will see 4 columns, each representing a possible status for the data:

1) CHANGELESS data.

This data will not change of codepoint during the characterset conversion. In other words all those characters use the SAME codes/codepoints in the new characterset as in the old characterset. When using Csalter or for 8i/9i "Alter Database Character Set" this data needs no action.

2) CONVERTIBLE data.

This data is valid, but the characters will change to a different code point in the new characterset. When using Csalter / "Alter database character set" any User/Application Data that is "convertible" needs to be exported and truncated/deleted before the change of the characterset and imported afterwards.

3) TRUNCATION data.

This is a special case of "convertible" data, meaning that the amount of BYTES the data uses will expand in the conversion and the field it presently occupies is not large enough to hold the data post-conversion. Typically this happens when going from a single byte characterset (where one character is one byte) to a multi byte characterset (where on character can be one, two or more bytes) like Unicode (AL32UTF8/UTF8). If you do not take action this you will see ORA-01401: inserted value too large for column or from 10g onwards: ORA-12899: value too large for column during import.

Truncation in Data Dictionary objects is very rare and will be solved by using the steps for convertible Data Dictionary data.

4) LOSSY data.

This data is not a valid code point for the source NLS_CHARACTERSET or the target character set you specified does not define that character. If no action is taken then this data will be 'lost' in the conversion. Every "lossy" character will be then converted to the "default replacement character". What that "default replacement character" is depends on the character sets. For US7ASCII this is a question mark "?" and for xx8ISO8859Px and xx8MSWIN125x character sets this is inverted question mark "¿".

Please note that every lossy is converted to the same "default replacement character" hence once you have done a conversion that resulted in lossy there is no way to "recover" the original data. If you have for example an US7ASCII database that contains the string "é? ç" (= 3 characters who are not defined in US7ASCII) and you exp/imp this into a WE8MSWIN1252 database then this will become "¿¿¿" in the WE8MSWIN1252 database. Seen you have 3 times the same character for each "source" character there is no way after the conversion to debug what was supposed to be there. You will need to go back to the source database for this. This is not a bug, simply how character sets work and standard behavior.

So this is an important column to watch, because this data needs to be addressed to preserve the content and prevent data loss.

What to do if you have LOSSY data?

When using Csscan to go from one character set to another, Csscan depends on both the source and target character set declaration to see if the character can be converted. If a csscan FROMCHAR=<current NLS_CHARACTERSET> TOCHAR=<new NLS_CHARACTERSET> gives "lossy" data then this csscan mean 2 things:

1) The lossy characters are not known in the target/new characterset (less likely to happen)

If you do a Csscan FROMCHAR=<current NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> and you do NOT have any "lossy" then you have this situation, the only solution then is to choose another target character set or accept that you will lose this data.

2) The source database has data (codes) that are NOT defined in the source characterset.

If you do a Csscan FROMCHAR=<current NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> and you have "lossy" then you have this situation. This is the hard case. You then need to figure out the ACTUAL encoding/characterset of the data inside your database. Csscan can be of help here, but it never can be considered conclusive. Csscan can only check if a certain code is known in a characterset, it has no idea of the intended meaning of those codes

(Same issue we had with WSCWEBT1 and below is the action we need to take in such case.)
Most of the time we know that we are using windows clients, so guess this is in the encoding of the windows clients (who are all US Windows systems - 1252 system). So you do a Csscan FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 and Csscan does not give any "lossy". Hence you change the database to WE8MSWIN1252 using csalter or "alter database character set" to correct this lossy.

What to do if you have CONVERTIBLE data?

There should not be any convertible data in dictionary tables. If there are then need to follow Note 258904.1 Solving Convertible or Lossy data in Data Dictionary objects when changing the NLS_CHARACTERSET

CSSCAN references SYS.SCHEDULER$_EVENT_LOG and SYS.HISTGRM$ (Doc ID 727359.1)

If there are convertible data in applications table then these tables would be exported and then truncated/deleted.

Always run a new scan after this. If the scan is clean then you can proceed with the convert. Clean here means No “Lossy” and “Convertible” for dictionary as well as application tables. The tables can be imported after character set convert.

Below select will only give all "Convertible" columns if CAPTURE=Y was used in CSSCAN. If you used CAPTURE=N (the default) then the result set of below select will NOT list the "Convertible" and only the "Truncation" data will be listed. But regardless of this the .txt file will always list the convertible user data regardless of the setting of CAPTURE. The outputs of the query to be send to Apps DBA where they will be asked to take export of listed tables and truncate them thereafter.

SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' - '
|| z.error_type
|| ' ' UserColumns
FROM csmig.csmv$errors z
WHERE z.owner_name NOT IN
(SELECT DISTINCT username FROM csmig.csm$dictusers)
ORDER BY UserColumns
/

In 10g and up verify at least in .txt the "Expansion" column found under [Database Size] and check you have at least 2 times the expansion listed for SYSTEM tablespace free. This is the size needed for Csalter to update Data Dictionary CLOB. Otherwise you will see errors like "ORA-01691: unable to extend lob segment SYS.SYS_LOB0000058943C00039$$ by 1598 in tablespace SYSTEM " during Csalter.

In general (for any version) it's a good idea to check the "Expansion" column and see that there is enough space in each listed tablespace.

All is set for convert now.

Oracle recommends running csalter for 10g and high version and alter database for 9i and below version. Please follow the same strictly.

12c onward

For 12c and above:

Please use DMU to change National character set for database.

NOTE: DMU does not check for any loss of data during the conversion.So please run cancan to check any possible data loss.

For 10g and 11g

Csalter.plb needs to be used within 7 days after the csscan run else we will get ‘'The CSSCAN result has expired' message.

Note down “job_queue_processes” and “aq_tm_processes” process parameter. This will have to be reverted back after convert. Then run the csalter.plb in restrict mode.

shutdown immediate;
startup restrict;

SPOOL Nswitch.log@?/rdbms/admin/csalter.plb
SPOOL off

shutdown;
startup;
select * from NLS_DATABASE_PARAMETERS where parameter like 'NLS_CHARACTERSET';

9i and below

shutdown immediate;
startup mount;
alter system enable restricted session;
alter system set job_queue_processes=0 scope=memory;
alter system set aq_tm_processes=0 scope=memory;
alter database open;
alter database character set internal_use utf8;
shutdown immediate;
startup;
select * from NLS_DATABASE_PARAMETERS where parameter like 'NLS_CHARACTERSET';

FOR 9i databases

The Character Set Scanner should report only "Changeless" data in both the data Dictionary and in application data. If any Convertible or Exceptional data are reported, the ALTER DATABASE [NATIONAL] CHARACTER SET statement must NOT be used without further investigation of the source and type of these data.

You can learn more about oracle database administration:

Pro Oracle GoldenGate for the DBA

Oracle Security Tools Installation and Configuration

Expert Oracle Exadata

Oracle Database 12c Release 2 Real Application Clusters Handbook: Concepts, Administration, Tuning & Troubleshooting

Expert Oracle RAC 12c

============================================================

Please check our blog for changing National Character Set.