iconv
libraryThe recode
library itself contains most code and tables from the
portable iconv
library, written by Bruno Haible. In fact, many
capabilities of the recode
library are duplicated because of this
merging, as the older recode
and iconv
libraries share many
charsets. We discuss, here, the issues related to this duplication, and
other peculiarities specific to the iconv
library. The plan is to
remove duplications and better merge specificities, as recode
evolves.
As implemented, if a recoding request can be satisfied by the recode
library both with and without its iconv
library part, it is likely
that the iconv
library will be used. To sort out if the iconv
is indeed used of not, just use the ‘-v’ or ‘--verbose’ option,
see Controlling how files are recoded.
The :libiconv:
charset represents a conceptual pivot charset
within the iconv
part of the recode
library (in fact,
this pivot exists, but is not directly reachable). This charset has a
mere :
(a colon) for an alias. It is not allowed to recode from
or to this charset directly. But when this charset is selected as an
intermediate, usually by automatic means, then the iconv
part
of the recode
library is called to handle the transformations.
By using an ‘--ignore=:libiconv:’ option on the recode
call
or equivalently, but more simply, ‘-x:’, recode
is instructed
to fully avoid this charset as an intermediate, with the consequence that
the iconv
part of the library is defeated. Consider these two calls:
recode l1..1250 < input > output recode -x: l1..1250 < input > output
Both should transform input from ISO-8859-1
to CP1250
on output. The first call uses the iconv
part of the library,
while the second call avoids it. Whatever the path used, the results should
normally be identical. However, there might be observable differences.
Most of them might result from reversibility issues, as the iconv
engine, which the recode
library directly uses for the time being,
does not address reversibility. Even if much less likely, some differences
might result from slight errors in the tables used, such differences should
then be reported as bugs.
Other irregularities might be seen in the area of error detection and
recovery. The recode
library usually tries to detect canonicity
errors in input, and production of ambiguous output, but the iconv
part of the library currently does not. Input is always validated, however.
The recode
library may not always react properly when its iconv
part has no translation for a given character.
Within a collection of names for a single charset, the recode
library distinguishes one of them as being the genuine charset name,
while the others are said to be aliases. When recode
lists all
charsets, for example with the ‘-l’ or ‘--list’ option, the list
integrates all iconv
library charsets. The selection of one of the
aliases as the genuine charset name is an artifact added by recode
,
it does not come from iconv
. Moreover, the recode
library
dynamically resolves some conflicts when it initialises itself at runtime.
This might explain some discrepancies in the table below, as for what is
the genuine charset name.
US-ASCII
¶ASCII
, ISO646-US
, ISO_646.IRV:1991
, ISO-IR-6
, ANSI_X3.4-1968
, CP367
, IBM367
, US
, csASCII
and ISO646.1991-IRV
are aliases for this charset.
UTF-8
¶UTF8
is an alias for this charset.
UCS-2
¶ISO-10646-UCS-2
and csUnicode
are aliases for this charset.
UCS-2BE
¶UNICODEBIG
, UNICODE-1-1
and csUnicode11
are aliases for this charset.
UCS-2LE
¶UNICODELITTLE
is an alias for this charset.
UCS-4
¶ISO-10646-UCS-4
and csUCS4
are aliases for this charset.
UCS-4BE
¶UCS-4LE
¶UTF-16
¶UTF-16BE
¶UTF-16LE
¶UTF-7
¶UNICODE-1-1-UTF-7
and csUnicode11UTF7
are aliases for this charset.
UCS-2-INTERNAL
¶UCS-2-SWAPPED
¶UCS-4-INTERNAL
¶UCS-4-SWAPPED
¶JAVA
¶ISO-8859-1
¶ISO_8859-1
, ISO_8859-1:1987
, ISO-IR-100
, CP819
, IBM819
, LATIN1
, L1
, csISOLatin1
, ISO8859-1
and ISO8859_1
are aliases for this charset.
ISO-8859-2
¶ISO_8859-2
, ISO_8859-2:1987
, ISO-IR-101
, LATIN2
, L2
, csISOLatin2
, ISO8859-2
and ISO8859_2
are aliases for this charset.
ISO-8859-3
¶ISO_8859-3
, ISO_8859-3:1988
, ISO-IR-109
, LATIN3
, L3
, csISOLatin3
, ISO8859-3
and ISO8859_3
are aliases for this charset.
ISO-8859-4
¶ISO_8859-4
, ISO_8859-4:1988
, ISO-IR-110
, LATIN4
, L4
, csISOLatin4
, ISO8859-4
and ISO8859_4
are aliases for this charset.
ISO-8859-5
¶ISO_8859-5
, ISO_8859-5:1988
, ISO-IR-144
, CYRILLIC
, csISOLatinCyrillic
, ISO8859-5
and ISO8859_5
are aliases for this charset.
ISO-8859-6
¶ISO_8859-6
, ISO_8859-6:1987
, ISO-IR-127
, ECMA-114
, ASMO-708
, ARABIC
, csISOLatinArabic
, ISO8859-6
and ISO8859_6
are aliases for this charset.
ISO-8859-7
¶ISO_8859-7
, ISO_8859-7:1987
, ISO-IR-126
, ECMA-118
, ELOT_928
, GREEK8
, GREEK
, csISOLatinGreek
, ISO8859-7
and ISO8859_7
are aliases for this charset.
ISO-8859-8
¶ISO_8859-8
, ISO_8859-8:1988
, ISO-IR-138
, HEBREW
, csISOLatinHebrew
, ISO8859-8
and ISO8859_8
are aliases for this charset.
ISO-8859-9
¶ISO_8859-9
, ISO_8859-9:1989
, ISO-IR-148
, LATIN5
, L5
, csISOLatin5
, ISO8859-9
and ISO8859_9
are aliases for this charset.
ISO-8859-10
¶ISO_8859-10
, ISO_8859-10:1992
, ISO-IR-157
, LATIN6
, L6
, csISOLatin6
and ISO8859-10
are aliases for this charset.
ISO-8859-13
¶ISO_8859-13
, ISO-IR-179
, LATIN7
and L7
are aliases for this charset.
ISO-8859-14
¶ISO_8859-14
, ISO_8859-14:1998
, ISO-IR-199
, LATIN8
and L8
are aliases for this charset.
ISO-8859-15
¶ISO_8859-15
, ISO_8859-15:1998
and ISO-IR-203
are aliases for this charset.
ISO-8859-16
¶ISO_8859-16
, ISO_8859-16:2000
and ISO-IR-226
are aliases for this charset.
KOI8-R
¶csKOI8R
is an alias for this charset.
KOI8-U
¶KOI8-RU
¶CP1250
¶WINDOWS-1250
and MS-EE
are aliases for this charset.
CP1251
¶WINDOWS-1251
and MS-CYRL
are aliases for this charset.
CP1252
¶WINDOWS-1252
and MS-ANSI
are aliases for this charset.
CP1253
¶WINDOWS-1253
and MS-GREEK
are aliases for this charset.
CP1254
¶WINDOWS-1254
and MS-TURK
are aliases for this charset.
CP1255
¶WINDOWS-1255
and MS-HEBR
are aliases for this charset.
CP1256
¶WINDOWS-1256
and MS-ARAB
are aliases for this charset.
CP1257
¶WINDOWS-1257
and WINBALTRIM
are aliases for this charset.
CP1258
¶WINDOWS-1258
is an alias for this charset.
ARMSCII-8
¶Georgian-Academy
¶Georgian-PS
¶MuleLao-1
¶CP1133
¶IBM-CP1133
is an alias for this charset.
TIS-620
¶TIS620
, TIS620-0
, TIS620.2529-1
, TIS620.2533-0
, TIS620.2533-1
and ISO-IR-166
are aliases for this charset.
CP874
¶WINDOWS-874
is an alias for this charset.
VISCII
¶VISCII1.1-1
and csVISCII
are aliases for this charset.
TCVN
¶TCVN-5712
, TCVN5712-1
and TCVN5712-1:1993
are aliases for this charset.
JIS_C6220-1969-RO
¶ISO646-JP
, ISO-IR-14
, JP
and csISO14JISC6220ro
are aliases for this charset.
JIS_X0201
¶JISX0201-1976
, X0201
, csHalfWidthKatakana
, JISX0201.1976-0
and JIS0201
are aliases for this charset.
JIS_X0208
¶JIS_X0208-1983
, JIS_X0208-1990
, JIS0208
, X0208
, ISO-IR-87
, csISO87JISX0208
, JISX0208.1983-0
, JISX0208.1990-0
and JIS0208
are aliases for this charset.
JIS_X0212
¶JIS_X0212.1990-0
, JIS_X0212-1990
, X0212
, ISO-IR-159
, csISO159JISX02121990
, JISX0212.1990-0
and JIS0212
are aliases for this charset.
GB_1988-80
¶ISO646-CN
, ISO-IR-57
, CN
and csISO57GB1988
are aliases for this charset.
GB_2312-80
¶ISO-IR-58
, csISO58GB231280
, CHINESE
and GB2312.1980-0
are aliases for this charset.
ISO-IR-165
¶CN-GB-ISOIR165
is an alias for this charset.
KSC_5601
¶KS_C_5601-1987
, KS_C_5601-1989
, ISO-IR-149
, csKSC56011987
, KOREAN
, KSC5601.1987-0
and KSX1001:1992
are aliases for this charset.
EUC-JP
¶EUCJP
, Extended_UNIX_Code_Packed_Format_for_Japanese
, csEUCPkdFmtJapanese
and EUC_JP
are aliases for this charset.
SJIS
¶SHIFT_JIS
, SHIFT-JIS
, MS_KANJI
and csShiftJIS
are aliases for this charset.
CP932
¶ISO-2022-JP
¶csISO2022JP
and ISO2022JP
are aliases for this charset.
ISO-2022-JP-1
¶ISO-2022-JP-2
¶csISO2022JP2
is an alias for this charset.
EUC-CN
¶EUCCN
, GB2312
, CN-GB
, csGB2312
and EUC_CN
are aliases for this charset.
GBK
¶CP936
is an alias for this charset.
GB18030
¶ISO-2022-CN
¶csISO2022CN
and ISO2022CN
are aliases for this charset.
ISO-2022-CN-EXT
¶HZ
¶HZ-GB-2312
is an alias for this charset.
EUC-TW
¶EUCTW
, csEUCTW
and EUC_TW
are aliases for this charset.
BIG5
¶BIG-5
, BIG-FIVE
, BIGFIVE
, CN-BIG5
and csBig5
are aliases for this charset.
CP950
¶BIG5HKSCS
¶EUC-KR
¶EUCKR
, csEUCKR
and EUC_KR
are aliases for this charset.
CP949
¶UHC
is an alias for this charset.
JOHAB
¶CP1361
is an alias for this charset.
ISO-2022-KR
¶csISO2022KR
and ISO2022KR
are aliases for this charset.
CHAR
¶WCHAR_T
¶