Oracle Character set change
Oracle Character set change
Refer the following note
for Migrating character set for an Oracle Applications database:
Note 124721.1 (Migrating an
Applications Installation to a New Character Set.)
(In case of any
issues/doubts it’s better to log an SR for assistance.)
You should always check this by using the Character Set Scanner (Csscan) before making any changes to your character set. Even when using Exp/imp or Expdp/Impdp.
Note 458122.1
Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Note 745809.1
Installing and configuring Csscan in 10g and 11g (Database Character Set
Scanner)
Note 444701.1
Csscan output explained
For
the majority of customers an Unicode character set (AL32UTF8) is the best
choice See Note 333489.1 Choosing a database character set means choosing
Unicode.
For
non-Unicode charactersets the best choice are xx8MSWIN125x charactersets, even
if the database itself runs on an Unix platform. The reason is simply that the
majority of the clients are windows based systems, hence the best non-Unicode
characterset for a database is a characterset that can store all the characters
known by those clients, which means an xx8MSWIN125x characterset:
From 11g onwards the DBCA offers by default a list of "recommended" charactersets to use as NLS_CHARACTERSET which , for the same reason as above, does not included charactersets like WE8ISO8859P1 , note that ALL charactersets are still SUPPORTED and "non recommended" charactersets can be used if needed.
Pre-checks before going for migration
Before starting character set conversion, we need to do following pre-checking against database.
1. Invalid objects
2. Orphaned datapump master
tables (10g and up)
3. Objects in the
recyclebin (10g and up)
4. Leftover temporary
tables using CHAR semantics
Note that, please log on to
the database with sysdba to continue the following steps.
“BACKUP OF THE DATABASE IS MUST TO TAKE BEFORE THIS ACTIVITY”
1)
Invalid objects
We need to Issue the
following sql statement to check invalid objects
SQL> col object_name for a40
SQL> select owner,object_name
from dba_objects where status='INVALID';
The above sql statement
lists all of schemas which contains INVALID objects. These invalid objects need
to be compiled or dropped if they are unused.
SQL> exec
utl_recomp.recomp_serial(‘SCHEMA’); (For
single Schema)
SQL> @$ORACLE_HOME/rdbms/admin/utlrp.sql;
2) Orphaned datapump master tables (10g and up)
To check it, Issue the
following sql statement:
SQL> SELECT o.status, o.object_id, o.object_type, o.owner||'.'||object_name "OWNER.OBJECT" FROM dba_objects o, dba_datapump_jobs j WHERE o.owner=j.owner_name AND o.object_name=j.job_name AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;
If “no rows selected", proceed next step. Otherwise, check
Note: 336014.1 "How To Cleanup
Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS"
3) Objects in the recyclebin (10g and up)
SQL> SELECT OWNER,
ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin order by 1,2;
If there are objects in the recyclebin then perform
SQL> PURGE DBA_RECYCLEBIN;
This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will be seen.
4) Leftover temporary tables using CHAR semantics
SQL> select C.owner ||'.'|| C.table_name ||'.'|| C.column_name ||' ('|| C.data_type ||' '|| C.char_length ||' CHAR)' from all_tab_columns C where C.char_used = 'C' and C.table_name in (select table_name from dba_tables where temporary='Y') and C.data_type in ('VARCHAR2', 'CHAR') order by 1;
If “no rows selected",
proceed next step. Otherwise, check
Note: 4157602.8 DBMS_STATS
"ORA_TEMP_%_DS_%" temporary tables not cleaned up.
NOTE: - Also make sure that no "binary" (Encrypted) data is stored in character data types (CHAR,VARCHAR2,LONG,CLOB).
Running
csscan
$ .oraenv (db_name)
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ csscan FULL=Y
FROMCHAR=< current NLS_CHARACTERSET> TOCHAR=<New NLS_CHARACTERSET>
LOG=CSSCANOUTPUT CAPTURE=Y ARRAY=1000000 PROCESS=2
This will create 3 files:
csscanoutput.out:
A log of the output of csscan
csscanoutput.txt:
Database Scan Summary Report
csscanoutput.err:
Contains the rowid’s of the Convertible and Lossy rows reported in
csscanoutput.txt
It is very important to go
through the status of data in the Conversion Summary of the .txt file. The
"Conversion summary" in the .txt file is the first important thing to
look at. This is where you will see 4 columns, each representing a possible
status for the data:
1)
CHANGELESS data.
This data will not change
of codepoint during the characterset conversion. In other words all those
characters use the SAME codes/codepoints in the new characterset as in the old
characterset. When using Csalter or for 8i/9i
"Alter Database Character Set" this data needs no action.
2) CONVERTIBLE data.
This data is valid, but the
characters will change to a different code point in the new characterset. When
using Csalter / "Alter database character set" any User/Application
Data that is "convertible" needs to be exported and truncated/deleted
before the change of the characterset and imported afterwards.
3) TRUNCATION data.
This is a special case of
"convertible" data, meaning that the amount of BYTES the data uses
will expand in the conversion and the field it presently occupies is not large
enough to hold the data post-conversion. Typically this happens when going from
a single byte characterset (where one character is one byte) to a multi byte
characterset (where on character can be one, two or more bytes) like Unicode
(AL32UTF8/UTF8). If you do not take action this you will see ORA-01401:
inserted value too large for column or from 10g onwards: ORA-12899: value too
large for column during import.
Truncation in Data
Dictionary objects is very rare and will be solved by using the steps for
convertible Data Dictionary data.
4) LOSSY data.
This data is not a valid
code point for the source NLS_CHARACTERSET or the target character set you
specified does not define that character. If no action is taken then this data
will be 'lost' in the conversion. Every "lossy" character will be
then converted to the "default replacement character". What that
"default replacement character" is depends on the character sets. For
US7ASCII this is a question mark "?" and for xx8ISO8859Px and
xx8MSWIN125x character sets this is inverted question mark "¿".
Please note that every
lossy is converted to the same "default replacement character" hence
once you have done a conversion that resulted in lossy there is no way to
"recover" the original data. If you have for example an US7ASCII
database that contains the string "é? ç" (= 3 characters who are not
defined in US7ASCII) and you exp/imp this into a WE8MSWIN1252 database then
this will become "¿¿¿" in the WE8MSWIN1252 database. Seen you have 3
times the same character for each "source" character there is no way
after the conversion to debug what was supposed to be there. You will need to
go back to the source database for this. This is not a bug, simply how
character sets work and standard behavior.
So this is an important
column to watch, because this data needs to be addressed to preserve the
content and prevent data loss.
What
to do if you have LOSSY data?
When using Csscan to go
from one character set to another, Csscan depends on both the source and target
character set declaration to see if the character can be converted. If a csscan
FROMCHAR=<current NLS_CHARACTERSET> TOCHAR=<new NLS_CHARACTERSET>
gives "lossy" data then this csscan mean 2 things:
If you do a Csscan FROMCHAR=<current
NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> and you do NOT
have any "lossy" then you have this situation, the only solution then
is to choose another target character set or accept that you will lose this
data.
2) The source database has data (codes) that are NOT defined in the source characterset.
If you do a Csscan FROMCHAR=<current
NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> and you have
"lossy" then you have this situation. This is the hard case. You then
need to figure out the ACTUAL encoding/characterset of the data inside your
database. Csscan can be of help here, but it never can be considered
conclusive. Csscan can only check if a certain code is known in a characterset,
it has no idea of the intended meaning of those codes
(Same issue we had with WSCWEBT1 and below is
the action we need to take in such case.)
Most of the time we know that we are using windows clients, so guess
this is in the encoding of the windows clients (who are all US Windows systems
- 1252 system). So you do a Csscan FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252
and Csscan does not give any "lossy". Hence you change the database
to WE8MSWIN1252 using csalter or "alter database character set" to
correct this lossy.
What
to do if you have CONVERTIBLE data?
There should not be any
convertible data in dictionary tables. If there are then need to follow Note 258904.1 Solving Convertible or
Lossy data in Data Dictionary objects when changing the NLS_CHARACTERSET
CSSCAN references SYS.SCHEDULER$_EVENT_LOG and
SYS.HISTGRM$ (Doc ID 727359.1)
If there are convertible data in applications table then these tables would be exported and then truncated/deleted.
Always run a new scan after this. If the scan is clean then you can proceed with the convert. Clean here means No “Lossy” and “Convertible” for dictionary as well as application tables. The tables can be imported after character set convert.
Below select will only give all "Convertible" columns if CAPTURE=Y was used in CSSCAN. If you used CAPTURE=N (the default) then the result set of below select will NOT list the "Convertible" and only the "Truncation" data will be listed. But regardless of this the .txt file will always list the convertible user data regardless of the setting of CAPTURE. The outputs of the query to be send to Apps DBA where they will be asked to take export of listed tables and truncate them thereafter.
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' - '
|| z.error_type
|| ' ' UserColumns
FROM csmig.csmv$errors z
WHERE z.owner_name NOT IN
(SELECT DISTINCT username FROM
csmig.csm$dictusers)
ORDER BY UserColumns
/
In 10g and up verify at least in .txt the "Expansion" column found under [Database Size] and check you have at least 2 times the expansion listed for SYSTEM tablespace free. This is the size needed for Csalter to update Data Dictionary CLOB. Otherwise you will see errors like "ORA-01691: unable to extend lob segment SYS.SYS_LOB0000058943C00039$$ by 1598 in tablespace SYSTEM " during Csalter.
In general (for any version) it's a good idea to check the "Expansion" column and see that there is enough space in each listed tablespace.
All
is set for convert now.
Oracle recommends running
csalter for 10g and high version and alter database for 9i and below version.
Please follow the same strictly.
12c onward
For 12c and above:
Please use DMU to change National character set for database.
NOTE: DMU does not check for any loss of data during the conversion.So please run cancan to check any possible data loss.
For 10g and 11g
Csalter.plb needs to be
used within 7 days after the csscan run else we will get ‘'The CSSCAN result
has expired' message.
shutdown
immediate;
startup restrict;
SPOOL Nswitch.log
@?/rdbms/admin/csalter.plb
SPOOL
off
shutdown;
startup;
select * from
NLS_DATABASE_PARAMETERS where parameter like 'NLS_CHARACTERSET';
shutdown immediate;
startup mount;
alter system enable restricted
session;
alter system set
job_queue_processes=0 scope=memory;
alter system set aq_tm_processes=0
scope=memory;
alter database open;
alter database character set
internal_use utf8;
shutdown immediate;
startup;
select * from
NLS_DATABASE_PARAMETERS where parameter like 'NLS_CHARACTERSET';
FOR 9i databases
The
Character Set Scanner should report only "Changeless" data in both
the data Dictionary and in application data. If any Convertible or
Exceptional data are reported, the ALTER DATABASE [NATIONAL] CHARACTER SET
statement must NOT be used without further investigation of the source and type
of these data.
You can learn more about oracle database administration:
Pro Oracle GoldenGate for the DBA
Oracle Security Tools Installation and Configuration
============================================================
Please check our blog for changing National Character Set.
Comments
Post a Comment
Please do not enter any spam link in comment Section suggestions are Always Appreciated. Thanks.. !