|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
bcp accent problemHello,
I'm importing data from formatted text files with accent (Latin1). My database is set to French_CI_AS. I use bcp with -C"Latin1" or -C"French_CI_AS" but bcp seems to ignore the flag and every single accent is left behind. That is if on column is "rené" in the source file, it becomes "ren" in the database. From experience, the -C flag should fix that problem but here nothing seems to work. Any ideas anyone? Thanks, Sitar. Sitar (Si***@discussions.microsoft.com) writes:
> I'm importing data from formatted text files with accent (Latin1). My For -C you specify a code page, not a collation, as I recall. Thus, if the> database is set to French_CI_AS. I use bcp with -C"Latin1" or > -C"French_CI_AS" but bcp seems to ignore the flag and every single > accent is left behind. > > That is if on column is "rené" in the source file, it becomes "ren" in the > database. > > From experience, the -C flag should fix that problem but here nothing > seems to work. file is in the ANSI code page, specify -C ANSI or -C RAW. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Hi Erland,
Actually when I use bcp directly it works with either ANSI or Latin1. But when I use a bcp format file it just stops working properly. -- Sitar Show quote "Erland Sommarskog" wrote: > For -C you specify a code page, not a collation, as I recall. Thus, if the > file is in the ANSI code page, specify -C ANSI or -C RAW. > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Sitar (Si***@discussions.microsoft.com) writes:
> Actually when I use bcp directly it works with either ANSI or Latin1. But So how does the format file look like? Do you explicitly specify the > when I use a bcp format file it just stops working properly. collation in it? -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx The bcp format file looks like this:
8.0 11 1 SQLCHAR 0 0 "" 0 NUM_SEQ_SERVICE "" 2 SQLCHAR 0 6 ";" 2 CODE_ETABL_ID French_CI_AS 3 SQLCHAR 0 35 ";" 4 LIBELLE_ETABL French_CI_AS 4 SQLCHAR 0 4 ";" 5 CODE_CATEG_ETABL French_CI_AS 5 SQLCHAR 0 2 ";" 3 CODE_REGION_ID French_CI_AS 6 SQLCHAR 0 24 ";" 6 DATE_DEBUT "" 7 SQLCHAR 0 24 "\r\n" 7 DATE_FIN "" 8 SQLCHAR 0 0 "" 0 OCCURRENCE_PREVALIDEE "" 9 SQLCHAR 0 0 "" 0 ETAT_OCCURRENCE French_CI_AS 10 SQLCHAR 0 0 "" 0 OCCURRENCE_ACTIVEE "" 11 SQLCHAR 0 0 "" 0 DER_DATE_SAISIE "" I tried with ANSI and RAW but it just does not do anything. Sitar ___ Show quote "Erland Sommarskog" wrote: > Sitar (Si***@discussions.microsoft.com) writes: > > Actually when I use bcp directly it works with either ANSI or Latin1. But > > when I use a bcp format file it just stops working properly. > > So how does the format file look like? Do you explicitly specify the > collation in it? > > > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Actually if I execute that commands from the server itself it works fine.
The problem occurs when I execute the command from another machine. More specifically the IIS machine. Sitar --- Show quote "Erland Sommarskog" wrote: > Sitar (Si***@discussions.microsoft.com) writes: > > Actually when I use bcp directly it works with either ANSI or Latin1. But > > when I use a bcp format file it just stops working properly. > > So how does the format file look like? Do you explicitly specify the > collation in it? > > > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Sitar (Si***@discussions.microsoft.com) writes:
> Actually if I execute that commands from the server itself it works fine. And the regional settings of that machine is French?> The problem occurs when I execute the command from another machine. More > specifically the IIS machine. Do you lose the accents, when run from a command-line window on the IIS server? Or only when you run from within IIS? Could you post the CREATE TABLE statement for the table, and some piece of sample data? It's good if you can pack into a zip file, to prevent that data get mashed in news transport. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > And the regional settings of that machine is French? Both the IIS server and SQL Server machines' regional settings are french.> Do you lose the accents, when run from a command-line window on the IIS Actually I haven't tried yet from a command-line window on the IIS server. I > server? Or only when you run from within IIS? tried from SQL Query Analyzer on my dev machine and it works fine, providing the format files and data files are hosted on the sql server. Otherwise the accents are dropped. I'll try on the IIS server. > Could you post the CREATE TABLE statement for the table, and some piece Not sure if I I'm allowed to post actual data but I'll see what I can do.> of sample data? It's good if you can pack into a zip file, to prevent > that data get mashed in news transport. - Sitar Sitar (Si***@discussions.microsoft.com) writes:
> Actually I haven't tried yet from a command-line window on the IIS This sounds funny. Did exactly say how you run the BCP command. You> server. I tried from SQL Query Analyzer on my dev machine and it works > fine, providing the format files and data files are hosted on the sql > server. Otherwise the accents are dropped. tried from Query Analyzer. This is not from where I run my command-line tools. Does this mean that you run BCP through xp_cmdshell? Or are you in fact using BULK INSERT? > Not sure if I I'm allowed to post actual data but I'll see what I can do. It does not have to be actual data. Just make up some dummy datathat demonstrates the problem. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Hi,
Sorry I haven't been specific enough. I ran bcp both from a dos prompt and also from the Query Analyzer with the bulk import command. With the bulk import command both the data file and the format file were stored on the SQL server. The problem seems to occur only when the data files and format files are stored on another machine than the Sql Server. So I suspect somewhere between the client and the sql server the data may get corrupted or converted to an from an english locale and the accent could be dropped. But I'm not sure why this would be happening. - Sitar Show quote > This sounds funny. Did exactly say how you run the BCP command. You > tried from Query Analyzer. This is not from where I run my command-line > tools. Does this mean that you run BCP through xp_cmdshell? Or are you > in fact using BULK INSERT? > > > Not sure if I I'm allowed to post actual data but I'll see what I can do. > > It does not have to be actual data. Just make up some dummy data > that demonstrates the problem. Sitar (Si***@discussions.microsoft.com) writes:
> I ran bcp both from a dos prompt and also from the Query Analyzer with All I can say is that it sounds very strange.> the bulk import command. With the bulk import command both the data file > and the format file were stored on the SQL server. The problem seems to > occur only when the data files and format files are stored on another > machine than the Sql Server. So I suspect somewhere between the client > and the sql server the data may get corrupted or converted to an from an > english locale and the accent could be dropped. But I'm not sure why > this would be happening. Could you clarify exactly what unwanted conversion you get? If you have "C'est un problème même si c'est Noël ou Pâques.", what does this end up as? -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Could you clarify exactly what unwanted conversion you get? If you ==> "C'est un problme mme si c'est Nol ou Pques."> have "C'est un problème même si c'est Noël ou Pâques.", what does > this end up as? All the chars with accent are simply dropped out. - Sitar Sitar (Si***@discussions.microsoft.com) writes:
>> Could you clarify exactly what unwanted conversion you get? If you That's even stranger. Normally problems with character translation>> have "C'est un problème même si c'est Noël ou Pâques.", what does >> this end up as? > >==> "C'est un problme mme si c'est Nol ou Pques." > > All the chars with accent are simply dropped out. results in the appearance of other characters. This could be pure junk characters, replacement characters (é becomes e etc) or some general placeholder like ?. But I have not seen a case where characters just gets dropped. There is something really fishy going here. It is as if you would have a network that filters all bytes with the high bit set. Could it be that the characters are replaced by some dummy character that is is invisible. If you import the sentence above, and then do SELECT convert(varbinar(80), col) on it, what do you get? Another thing to try. Put that file on a remote disk, and then do EXEC master..xp_cmdshell 'Type \\REMOTE\share\file.txt' and see how this displays. But I'm starting to suspect that whatever goes on here, SQL Server is innocent... -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > SELECT convert(varbinar(80), col) on it, what do you get? If I do that on a varchar column that I already have imported it says <Binary>> EXEC master..xp_cmdshell 'Type \\REMOTE\share\file.txt' Here I have other characters such as:524165;Rh¶ne Loire;ET;54;20051001;23000101 instead of: 524165;Rhône Loire;ET;54;20051001;23000101 I executed the command from SqlQuery Analyzer on my development machine with the file located on the Sql Server. - Sitar Sitar (Si***@discussions.microsoft.com) writes:
> What says <Binary>? Query Analyzer would not display a binary value like>> SELECT convert(varbinar(80), col) on it, what do you get? > > If I do that on a varchar column that I already have imported it says ><Binary> that. Please run the command in Query Analyzer. >> EXEC master..xp_cmdshell 'Type \\REMOTE\share\file.txt' So that is a classic OEM-to-ANSI conversion. xp_cmdshell assumes that > > Here I have other characters such as: > 524165;Rh¶ne Loire;ET;54;20051001;23000101 > instead of: > 524165;Rhône Loire;ET;54;20051001;23000101 the output is the OEM code page, so it interpretes ô which is 0xF4 as a the pilcrow ¶. > I executed the command from SqlQuery Analyzer on my development machine Actually, I explicitly asked you to try this with a file located on> with the file located on the Sql Server. a remote server, as you had indicated that when you bulk-loaded a file that was local on the server, it went fine. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > What says <Binary>? Query Analyzer would not display a binary value like IT says <Binary> in the Enterprise Manager sorry I thought it'd be the same.> that. Please run the command in Query Analyzer. The results with the Query Analyzer look like: 0x426F726465617578 0x5375642D417175697461696E65 0x4C616E677565646F632D526F757373696C6C6F6E 0x50617269732D457374 .... > > I executed the command from SqlQuery Analyzer on my development machine I don't have rights to do that on the network. I tried to mount a z: drive > > with the file located on the Sql Server. > > Actually, I explicitly asked you to try this with a file located on > a remote server, as you had indicated that when you bulk-loaded a file > that was local on the server, it went fine. on the sql server but the command: EXEC master..xp_cmdshell 'Type z:\file.txt' returns "incorrect filepath" whereas I can access it fine from a dos prompt on the same computer. I can't access a file located on another computer from the my Sql Server either. Or maybe there are command line parameters to specify a username/pwd that I haven't found. - Sitar Sitar (Si***@discussions.microsoft.com) writes:
> The results with the Query Analyzer look like: Sigh. You are certainly not making much effort to get this sorted out.> 0x426F726465617578 > 0x5375642D417175697461696E65 > 0x4C616E677565646F632D526F757373696C6C6F6E > 0x50617269732D457374 > ... When I run: select convert(varchar, 0x426F726465617578) select convert(varchar, 0x5375642D417175697461696E65) select convert(varchar, 0x4C616E677565646F632D526F757373696C6C6F6E) select convert(varchar, 0x50617269732D457374) I get: Bordeaux Sud-Aquitaine Languedoc-Roussillon Paris-Est No letters missing there what I can see. Then again, these names do not include any troublesome characters. > I don't have rights to do that on the network. I tried to mount a z: You would need to use a UNC path, as in \\server\dir1\dir2.> drive on the sql server but the command: EXEC master..xp_cmdshell 'Type > > z:\file.txt' returns "incorrect filepath" whereas I can access it fine > from a dos prompt on the same computer. If you mount a disk, that will not be visible for the SQL Server process. And if SQL Server is running under local system, it cannot access network resources at all. Anyway, that was just a test to see if there was something general on the network going on, and we can't perform that test, we can't. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Show quote
"Erland Sommarskog" wrote: Well how am I suppose to figure out that:> Sitar (Si***@discussions.microsoft.com) writes: > > The results with the Query Analyzer look like: > > 0x426F726465617578 > > 0x5375642D417175697461696E65 > > 0x4C616E677565646F632D526F757373696C6C6F6E > > 0x50617269732D457374 > > ... > > Sigh. You are certainly not making much effort to get this sorted out. > When I run: > > select convert(varchar, 0x426F726465617578) > select convert(varchar, 0x5375642D417175697461696E65) > select convert(varchar, 0x4C616E677565646F632D526F757373696C6C6F6E) > select convert(varchar, 0x50617269732D457374) SELECT convert(varbinar(80), col) actually means you to use the result of the previous query? Be more precise please. If I had the knowledge you have I wouldn't be asking in the first place... DB is not my field of expertise... I thought "col" was supposed to be the column name, and I got the result I pasted here... > I get: Yes, none of these words have accent, so what does this prove?> > Bordeaux > Sud-Aquitaine > Languedoc-Roussillon > Paris-Est > > No letters missing there what I can see. Then again, these names do > not include any troublesome characters. > You would need to use a UNC path, as in \\server\dir1\dir2. That is what I did to begin with but it DOES NOT work. It says unknown user or incorrect password, no matter how I try and I have admin rights on both my computer and the Sql Server, but not on the domain. > And if SQL Server is running under local system, it cannot That explains why the UNC path does not work then? So shall I change the > access network resources at all. local account running Sql Server? > Anyway, that was just a test to see if there was something general on I apologize if my reply is a bit cold, I had to work a bit this week-end and > the network going on, and we can't perform that test, we can't. I'm kinda grumpy... It's monday... - Sitar Sitar (Si***@discussions.microsoft.com) writes:
> Well how am I suppose to figure out that: Obviously I am asking to see how the bulk-loaded data where accents have> SELECT convert(varbinar(80), col) > actually means you to use the result of the previous query? Be more > precise please. been lost looks in binary form. I wanted to see if the characters could have been replaced by some invisible characters. > I just had the network admin over the phone. It seems the samba server for Are your client also using Samba?> this domain is not up to date and has a problem handling... guess what... > accents... > > Ok, so that explains *maybe* the problem. Maybe because the same problem > occured on the test platform of our client. And unless it is bad luck I > doubt their servers have the same problem as ours. So there must be > something else. I hardly even know what Samba is. Is that not Unix-related in some way? >> You would need to use a UNC path, as in \\server\dir1\dir2. I'm not sure, but my guess is that SQL Server runs as a domain> > That is what I did to begin with but it DOES NOT work. It says unknown > user or incorrect password, no matter how I try and I have admin rights > on both my computer and the Sql Server, but not on the domain. user (in opposition to LocalSystem) but lacks permission. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Are your client also using Samba? Under investigation, as it is a big network it is possible there is a samba server somewhere. > I hardly even know what Samba is. Is that not Unix-related in some way? Yes, file sharing for the part I'm concerned with.http://www.samba.org/samba/what_is_samba.html > I'm not sure, but my guess is that SQL Server runs as a domain I'll check that.> user (in opposition to LocalSystem) but lacks permission. "Erland Sommarskog" wrote: I cannot access any network resource except if \\server is the SqlServer > >> You would need to use a UNC path, as in \\server\dir1\dir2. > > > > That is what I did to begin with but it DOES NOT work. It says unknown > > user or incorrect password, no matter how I try and I have admin rights > > on both my computer and the Sql Server, but not on the domain. > > I'm not sure, but my guess is that SQL Server runs as a domain > user (in opposition to LocalSystem) but lacks permission. itself. And I ran both Sql Server and SqlServer Agent under Administrator account. I mounted a drive with net use and the local Administrator is allowed to access that ressource. I even set xp_sqlagent_proxy_account to an account that can access the resource. It does not work either. - Sitar Hi,
Well the problem on the Samba server side has been fixed. It did not change much on my side. Well instead of having the accents dropped now I have weird characters instead but I'm not sure if it is due to the change of settings on the Samba server or not. I have tried different options, as the files I have to bcp in are in UTF-8 now, I tried a -C 65001 to force a UTF-8 codepage. No change. I'm getting a bit desperate at that point, I have no clue as to what is going on and we're supposed to ship a clean release to our client in about 3 weeks. - Sitar ___ Show quote "Erland Sommarskog" wrote: > Sitar (Si***@discussions.microsoft.com) writes: > > Well how am I suppose to figure out that: > > SELECT convert(varbinar(80), col) > > actually means you to use the result of the previous query? Be more > > precise please. > > Obviously I am asking to see how the bulk-loaded data where accents have > been lost looks in binary form. I wanted to see if the characters could > have been replaced by some invisible characters. > > > I just had the network admin over the phone. It seems the samba server for > > this domain is not up to date and has a problem handling... guess what... > > accents... > > > > Ok, so that explains *maybe* the problem. Maybe because the same problem > > occured on the test platform of our client. And unless it is bad luck I > > doubt their servers have the same problem as ours. So there must be > > something else. > > Are your client also using Samba? > > I hardly even know what Samba is. Is that not Unix-related in some way? > > >> You would need to use a UNC path, as in \\server\dir1\dir2. > > > > That is what I did to begin with but it DOES NOT work. It says unknown > > user or incorrect password, no matter how I try and I have admin rights > > on both my computer and the Sql Server, but not on the domain. > > I'm not sure, but my guess is that SQL Server runs as a domain > user (in opposition to LocalSystem) but lacks permission. > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Sitar (Si***@discussions.microsoft.com) writes:
> Well the problem on the Samba server side has been fixed. It did not Weird characters are a lot better than no characters at all.> change much on my side. Well instead of having the accents dropped now I > have weird characters instead but I'm not sure if it is due to the > change of settings on the Samba server or not. Could you care to give a sample of how the data gets garbled? Please posted the garbled data as hex string (convert(binary), so that they don't get distored in news transsport. Of course, you need to include the original strings as well. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Hi,
These articles mention that Sql Server does not support UTF-8 and cannot do a proper conversion by itself. Also when I do a bulk insert with a codepage of 65001 which is supposed to be UTF-8 it still corrupts accentuated characters. If I convert the source file into ANSI. I have no problem and the data is fine. But I have no control on the system that generates the source file so it has to be UTF-8. http://support.microsoft.com/?scid=kb%3Ben-us%3B232580&x=14&y=14http://msdn.microsoft.com/library/default.asp?URL=/library/techart/IntlFeaturesInSQLServer2000.htm Regards, Sitar. I just had the network admin over the phone. It seems the samba server for
this domain is not up to date and has a problem handling... guess what... accents... Ok, so that explains *maybe* the problem. Maybe because the same problem occured on the test platform of our client. And unless it is bad luck I doubt their servers have the same problem as ours. So there must be something else. Our server should be patched within the week hopefully. - Sitar |
|||||||||||||||||||||||