|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
BCP and CSV - my conclusionsusing BCP, and I suspect that they will continue to crop up for ever. The conclusion that I have come to is that you might, if you are persistent and lucky, be able to get something working. Be prepared, however, for it to be tricky to achieve, and a little contrived, in a lot of real-life circumstances. It may even prove to be impossible if you are really unlucky with your data. So far I have only explored exporting to CSV, because that's all I have needed to do. If anyone has wrangled with importing CSVs and come to any useful conclusions, please post a follow-up. CSV is more than just putting "," in between all the fields using a format file and then jumping through a hoop to get the leading and trailing quotes in as well before and after the first and last fields respectively. For a start, numeric fields should not have quotes around them, so your format file is going to have to be crafted so that each field terminator is correct in respect of its own data type and the data type of the following field. Secondly, any double-quotes characters within a field need to be doubled up, so that the text [a piece of wood 2" long] becomes "a piece of wood 2"" long". This means that you can't just give a table name, you have to write an SQL query. It also makes the query considerably longer, all character fields that might contain a double-quotes character need to be wrapped in a translation function that replaces " with "". Now you aren't likely to hit the 8192 character limit for the length of Command Prompt commands, but this does make for some pretty unwieldy statements. The icing on the cake is how to deal with embedded carriage return characters in your data. Excel seems to be able to cope with this; when it comes across a line-end it seems to check to see if it is within a field that is enclosed in quotes, and accepts the line-end character as part of the cell value. I have no idea how BCP would cope with this - presumably if you are exporting data then it will happily write the line-end characters out, and it is up to the receiving program to handle them, as I said Excel seems to do this. As to what BCP would do with line ends inside a delimited field at the import stage, I have no idea. Phil. I prefer Tab-delimited, or some other delimited format, when I need to use
delimited files with BCP. Excel has the ability to save files in Tab-delimited format. Show quote "PhilHibbs" <sna***@gmail.com> wrote in message news:1192539859.727334.97190@i38g2000prf.googlegroups.com... > There have been several threads here on reading/writing CSV files > using BCP, and I suspect that they will continue to crop up for ever. > The conclusion that I have come to is that you might, if you are > persistent and lucky, be able to get something working. Be prepared, > however, for it to be tricky to achieve, and a little contrived, in a > lot of real-life circumstances. It may even prove to be impossible if > you are really unlucky with your data. So far I have only explored > exporting to CSV, because that's all I have needed to do. If anyone > has wrangled with importing CSVs and come to any useful conclusions, > please post a follow-up. > > CSV is more than just putting "," in between all the fields using a > format file and then jumping through a hoop to get the leading and > trailing quotes in as well before and after the first and last fields > respectively. > > For a start, numeric fields should not have quotes around them, so > your format file is going to have to be crafted so that each field > terminator is correct in respect of its own data type and the data > type of the following field. Secondly, any double-quotes characters > within a field need to be doubled up, so that the text [a piece of > wood 2" long] becomes "a piece of wood 2"" long". This means that you > can't just give a table name, you have to write an SQL query. It also > makes the query considerably longer, all character fields that might > contain a double-quotes character need to be wrapped in a translation > function that replaces " with "". Now you aren't likely to hit the > 8192 character limit for the length of Command Prompt commands, but > this does make for some pretty unwieldy statements. > > The icing on the cake is how to deal with embedded carriage return > characters in your data. Excel seems to be able to cope with this; > when it comes across a line-end it seems to check to see if it is > within a field that is enclosed in quotes, and accepts the line-end > character as part of the cell value. I have no idea how BCP would cope > with this - presumably if you are exporting data then it will happily > write the line-end characters out, and it is up to the receiving > program to handle them, as I said Excel seems to do this. As to what > BCP would do with line ends inside a delimited field at the import > stage, I have no idea. > > Phil. > PhilHibbs (sna***@gmail.com) writes:
> CSV is more than just putting "," in between all the fields using a BCP is a fairly squared tool. It's good for what it does, but if you> format file and then jumping through a hoop to get the leading and > trailing quotes in as well before and after the first and last fields > respectively. > > For a start, numeric fields should not have quotes around them, so > your format file is going to have to be crafted so that each field > terminator is correct in respect of its own data type and the data > type of the following field. Secondly, any double-quotes characters > within a field need to be doubled up, so that the text [a piece of > wood 2" long] becomes "a piece of wood 2"" long". need to create files with that special format, BCP is not your friend. Have you looked at Integration Services? I haven't myself, but there should be a lot more options for file-export, including CSVs. > The icing on the cake is how to deal with embedded carriage return When you use a format file, BCP reads the file as stream of bytes. > characters in your data. Excel seems to be able to cope with this; > when it comes across a line-end it seems to check to see if it is > within a field that is enclosed in quotes, and accepts the line-end > character as part of the cell value. I have no idea how BCP would cope > with this - presumably if you are exporting data then it will happily > write the line-end characters out, and it is up to the receiving > program to handle them, as I said Excel seems to do this. As to what > BCP would do with line ends inside a delimited field at the import > stage, I have no idea. CR-LF are just two bytes in the stream. If CR-LF is the terminator for the currenr field, then the field will stop there. If the current field has another terminator, the CR-LF will be part of the data. Thus embedded line breaks in the last field will mean that all the stuff after the line break will end up in the first field of the next record. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx On 16 Oct, 22:59, Erland Sommarskog <esq***@sommarskog.se> wrote:
> When you use a format file, BCP reads the file as stream of bytes. IBM DataStage seems to work in exactly the same way, you can't have> CR-LF are just two bytes in the stream. If CR-LF is the terminator > for the currenr field, then the field will stop there. If the current > field has another terminator, the CR-LF will be part of the data. Thus > embedded line breaks in the last field will mean that all the stuff > after the line break will end up in the first field of the next > record. embedded line-ends in the last field. Phil Hibbs. PhilHibbs (sna***@gmail.com) writes:
> On 16 Oct, 22:59, Erland Sommarskog <esq***@sommarskog.se> wrote: Well, you can can have embedded newlines in any field with BCP. What you>> When you use a format file, BCP reads the file as stream of bytes. >> CR-LF are just two bytes in the stream. If CR-LF is the terminator >> for the currenr field, then the field will stop there. If the current >> field has another terminator, the CR-LF will be part of the data. Thus >> embedded line breaks in the last field will mean that all the stuff >> after the line break will end up in the first field of the next >> record. > > IBM DataStage seems to work in exactly the same way, you can't have > embedded line-ends in the last field. cannot have is the field terminator in the field itself. Often when I run BCP in character mode, I use @!@ for field delimiters and \n<->\n as line terminators. Of course that format may not be useful if you are exporting to for instance Excel. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx In my experience, SSIS (and the data import wizard) seems to deal with the
full CSV format fine. Linchi Show quote "Erland Sommarskog" wrote: > PhilHibbs (sna***@gmail.com) writes: > > CSV is more than just putting "," in between all the fields using a > > format file and then jumping through a hoop to get the leading and > > trailing quotes in as well before and after the first and last fields > > respectively. > > > > For a start, numeric fields should not have quotes around them, so > > your format file is going to have to be crafted so that each field > > terminator is correct in respect of its own data type and the data > > type of the following field. Secondly, any double-quotes characters > > within a field need to be doubled up, so that the text [a piece of > > wood 2" long] becomes "a piece of wood 2"" long". > > BCP is a fairly squared tool. It's good for what it does, but if you > need to create files with that special format, BCP is not your friend. > > Have you looked at Integration Services? I haven't myself, but there > should be a lot more options for file-export, including CSVs. > > > The icing on the cake is how to deal with embedded carriage return > > characters in your data. Excel seems to be able to cope with this; > > when it comes across a line-end it seems to check to see if it is > > within a field that is enclosed in quotes, and accepts the line-end > > character as part of the cell value. I have no idea how BCP would cope > > with this - presumably if you are exporting data then it will happily > > write the line-end characters out, and it is up to the receiving > > program to handle them, as I said Excel seems to do this. As to what > > BCP would do with line ends inside a delimited field at the import > > stage, I have no idea. > > When you use a format file, BCP reads the file as stream of bytes. > CR-LF are just two bytes in the stream. If CR-LF is the terminator > for the currenr field, then the field will stop there. If the current > field has another terminator, the CR-LF will be part of the data. Thus > embedded line breaks in the last field will mean that all the stuff > after the line break will end up in the first field of the next > record. > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > |
|||||||||||||||||||||||