El TecnoBaúl de Kiquenet

Kiquenet boring stories

Posts Tagged ‘encoding’

osql.exe y encoding UTF8

Posted by kiquenet en 29 julio 2010

OSQL accepts ANSI and Unicode encoded files, BUT I tried opening my file with notepad++ changed the encoding to UTF-8 and I got similar errors.

MS SQL Server’s osql utility (SQL Server 2000) is deprecated.

Mejor usar SQLCMD.

I have been trying to run MS SQL Server’s osql utility (SQL Server 2000) to insert some default data. My SQLServer’s target database collation is Croatian_CI_AS. After some time I’ve realized a few things about the ways in which you can save the input file.

Using Notepad++ as my editor I thought that formatting a file as UTF-8 would cause everything to be wonderful and magically work.

Not so. In fact, osql will not even read the UTF-8 encoded file (this was a real surprise)! That was a weird thing but I kept on getting this error message about first line and some weird character displayed there. Turns out that osql accepts unicode or ANSI encoded files (and there’s no way to control the input file encoding from the command line so we’re stuck with that … which is not that bad after all).
So, no.1: don’t format your input file for osql as UTF-8. That’s a different animal from a unicode encoded file.

Notepad++ offers several formatting options. Turns out that UCS-2 Little Endian is the wizard. For those interested, read on UCS-2 on wikipedia. I just realized it was actually a 16-bit Unicode encoding. Just looking at it in the list of encodings in Notepad++ didn’t ring bells.

I’m not sure why UCS-2 Big Endian encoding does not play nice (I get gibberish for most of my special characters).

Formatting a file as ANSI in Notepad++ and then running through osql was a bit of a surprise that it did not work (special characters like diacritics became gibberish, although osql did execute everything). I even changed my Windows code page to Croatian in hope that it will all somehow sort itself out but it didn’t. From what I can tell,osql will read the ANSI encoded file as probably English Latin encoding and loose the special characters in the process.

So, bottom line, UCS-2 Little Endian is our friend. Use that for encoding files that are to be executed with osql

 

osql.exe and unicode files – how to save your sql scripts with encoding

osql.exe is a great application for running sql scripts in a batch.  I use a batch file to execute multiple sql scripts that I use to rebuild my current application database from scratch.  When developing a brand new application, deploying a database in this way makes it really easy to recreate the database just like it will be created on Day 1 when you build out the Production environment. This requires scripting out all of your sql objects and then also having a way to execute all of those sql scripts easily.  That is where osql.exe comes in handy.

But osql.exe does have one issue that I ran into this week where it does not like UTF-8 (codepage 65001) or UTF-7 (codepage 65000) encoded files.  And sometimes you need to include unicode characters in your sql scripts.  At first I thought osql just did not support unicode but that is not the case… it just does not like the UTF-8 or UTF-7 encoding.

Trying to run a UTF-8 (codepage 65001) or UTF-7 (codepage 65000) encoded file with osql.exe will give you errors such as:

Incorrect syntax near ‘+’.
Incorrect syntax near ‘ ‘.
Incorrect syntax near ‘ï’.

Saving the same file with Unicode Encoding (codepage 1200) will work just fine.  Here is how to save sql scripts in Microsoft SQL Server Management Studio with a particular encoding (you can also use this method to see what type of encoding the file is saved in in the first place).  One other thing to note is that Visual Studio has this same type of Save As… functionality.

From the Microsoft SQL Server Management Studio (or Visual Studio) File menu chooseSave [FILENAME] As…

image

Then when the Save File As dialog appears you will see a little black arrow (inverted triangle) as part of the Save button.

image

Clicking the just the inverted triangle portion of the button will give you a menu.

image

Choosing the Save with Encoding… option will then present you with an Advanced Save Options dialog.

image

image

Here is where you can specify the encoding to use for the file.  For osql.exe make sure you specify either Western European (Windows) – Codepage 1252 or Unicode – Codepage 1200.  Do not select UTF-8 (codepage 65001) or UTF-7 (codepage 65000) or osql.exe will give errors when trying to parse the file.

http://weblogs.asp.net/jeffwids/archive/2009/10/14/osql-exe-and-unicode-files-how-to-save-your-sql-scripts-with-encoding.aspx

http://kiribao.blogspot.com/2008/03/osql-and-input-file-encodings.html

Anuncios

Posted in Comandos, SQL | Etiquetado: , , | Leave a Comment »

Encoding en .NET

Posted by kiquenet en 4 junio 2010

http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx

códigos de página

37: IBM EBCDIC (US-Canada), IBM037, IBM037
437: OEM United States, IBM437, IBM437
500: IBM EBCDIC (International), IBM500, IBM500
708: Arabic (ASMO 708), ASMO-708, ASMO-708
720: Arabic (DOS), DOS-720, DOS-720
737: Greek (DOS), ibm737, ibm737
775: Baltic (DOS), ibm775, ibm775
850: Western European (DOS), ibm850, ibm850
852: Central European (DOS), ibm852, ibm852
855: OEM Cyrillic, IBM855, IBM855
857: Turkish (DOS), ibm857, ibm857
860: Portuguese (DOS), IBM860, IBM860
861: Icelandic (DOS), ibm861, ibm861
862: Hebrew (DOS), DOS-862, DOS-862
863: French Canadian (DOS), IBM863, IBM863
864: Arabic (864), IBM864, IBM864
865: Nordic (DOS), IBM865, IBM865
866: Cyrillic (DOS), cp866, cp866
869: Greek, Modern (DOS), ibm869, ibm869
874: Thai (Windows), windows-874, windows-874
875: IBM EBCDIC (Greek Modern), cp875, cp875
932: Japanese (Shift-JIS), iso-2022-jp, iso-2022-jp
936: Chinese Simplified (GB2312), gb2312, gb2312
949: Korean, ks_c_5601-1987, ks_c_5601-1987
950: Chinese Traditional (Big5), big5, big5
1026: IBM EBCDIC (Turkish Latin-5), IBM1026, IBM1026
1200: Unicode, utf-16, utf-16
1201: Unicode (Big-Endian), unicodeFFFE, unicodeFFFE
1250: Central European (Windows), iso-8859-2, windows-1250
1251: Cyrillic (Windows), koi8-r, windows-1251
1252: Western European (Windows), iso-8859-1, Windows-1252
1253: Greek (Windows), iso-8859-7, windows-1253
1254: Turkish (Windows), iso-8859-9, windows-1254
1255: Hebrew (Windows), windows-1255, windows-1255
1256: Arabic (Windows), windows-1256, windows-1256
1257: Baltic (Windows), windows-1257, windows-1257
1258: Vietnamese (Windows), windows-1258, windows-1258
1361: Korean (Johab), Johab, Johab
10000: Western European (Mac), macintosh, macintosh
10001: Japanese (Mac), x-mac-japanese, x-mac-japanese
10002: Chinese Traditional (Mac), x-mac-chinesetrad, x-mac-chinesetrad
10003: Korean (Mac), x-mac-korean, x-mac-korean
10004: Arabic (Mac), x-mac-arabic, x-mac-arabic
10005: Hebrew (Mac), x-mac-hebrew, x-mac-hebrew
10006: Greek (Mac), x-mac-greek, x-mac-greek
10007: Cyrillic (Mac), x-mac-cyrillic, x-mac-cyrillic
10008: Chinese Simplified (Mac), x-mac-chinesesimp, x-mac-chinesesimp
10010: Romanian (Mac), x-mac-romanian, x-mac-romanian
10017: Ukrainian (Mac), x-mac-ukrainian, x-mac-ukrainian
10021: Thai (Mac), x-mac-thai, x-mac-thai
10029: Central European (Mac), x-mac-ce, x-mac-ce
10079: Icelandic (Mac), x-mac-icelandic, x-mac-icelandic
10081: Turkish (Mac), x-mac-turkish, x-mac-turkish
10082: Croatian (Mac), x-mac-croatian, x-mac-croatian
20000: Chinese Traditional (CNS), x-Chinese-CNS, x-Chinese-CNS
20127: US-ASCII, us-ascii, us-ascii
20261: T.61, x-cp20261, x-cp20261
20290: IBM EBCDIC (Japanese katakana), IBM290, IBM290
20866: Cyrillic (KOI8-R), koi8-r, koi8-r
20932: Japanese (JIS 0208-1990 and 0212-1990), EUC-JP, EUC-JP
20936: Chinese Simplified (GB2312-80), x-cp20936, x-cp20936
20949: Korean Wansung, x-cp20949, x-cp20949
21027: Ext Alpha Lowercase, x-cp21027, x-cp21027
21866: Cyrillic (KOI8-U), koi8-u, koi8-u
28591: Western European (ISO), iso-8859-1, iso-8859-1
28592: Central European (ISO), iso-8859-2, iso-8859-2
28594: Baltic (ISO), iso-8859-4, iso-8859-4
28595: Cyrillic (ISO), iso-8859-5, iso-8859-5
28596: Arabic (ISO), iso-8859-6, iso-8859-6
28597: Greek (ISO), iso-8859-7, iso-8859-7
28598: Hebrew (ISO-Visual), iso-8859-8, iso-8859-8
28599: Turkish (ISO), iso-8859-9, iso-8859-9
28603: Estonian (ISO), iso-8859-13, iso-8859-13
28605: Latin 9 (ISO), iso-8859-15, iso-8859-15
38598: Hebrew (ISO-Logical), iso-8859-8-i, iso-8859-8-i
50220: Japanese (JIS), iso-2022-jp, iso-2022-jp
50221: Japanese (JIS-Allow 1 byte Kana), iso-2022-jp, iso-2022-jp
50222: Japanese (JIS-Allow 1 byte Kana – SO/SI), iso-2022-jp, iso-2022-jp
50225: Korean (ISO), iso-2022-kr, euc-kr
50227: Chinese Simplified (ISO-2022), x-cp50227, x-cp50227
51932: Japanese (EUC), euc-jp, euc-jp
51936: Chinese Simplified (EUC), EUC-CN, EUC-CN
51949: Korean (EUC), euc-kr, euc-kr
52936: Chinese Simplified (HZ), hz-gb-2312, hz-gb-2312
54936: Chinese Simplified (GB18030), GB18030, GB18030
57002: ISCII Devanagari, x-iscii-de, x-iscii-de
57003: ISCII Bengali, x-iscii-be, x-iscii-be
57004: ISCII Tamil, x-iscii-ta, x-iscii-ta
57005: ISCII Telugu, x-iscii-te, x-iscii-te
57006: ISCII Assamese, x-iscii-as, x-iscii-as
57007: ISCII Oriya, x-iscii-or, x-iscii-or
57008: ISCII Kannada, x-iscii-ka, x-iscii-ka
57009: ISCII Malayalam, x-iscii-ma, x-iscii-ma
57010: ISCII Gujarati, x-iscii-gu, x-iscii-gu
57011: ISCII Punjabi, x-iscii-pa, x-iscii-pa
65000: Unicode (UTF-7), utf-7, utf-7
65001: Unicode (UTF-8), utf-8, utf-8

Posted in .NET, Encoding | Etiquetado: , , , , , | Leave a Comment »