Replies: 2 comments 1 reply
-
This library does not support fixed-width text files. I'm reluctant to implement such a format, because I think they are conceptually invalid in the modern world. The primary advantage of fixed width formats was that you could seek to a specific record in a file. However, for this to be true each row needs to be the same length. Back when people were writing code in versions of C where character and byte were the same thing, that wasn't much of a problem. In a post-Unicode world, a character can be a varying number of bytes depending on encoding. So, what would be fixed per-line, the number of bytes or the number of characters? I think most users want to think about things from the perspective of characters, but encoding characters in UTF-8 will produce a varying number of bytes. The result is that the fixed-character-width file offers no advantage, because each row could still be a varying number of bytes, and thus you don't get the advantage of seeking. Can you describe what you consider fixed-width? |
Beta Was this translation helpful? Give feedback.
-
Thank you. Fixed width files in my case are exactly those, fixed column widths, for e.g. A->10 chars, B->5 chars and so on with no delimiter but fixed number of columns. I cannot control the format of these files as they are coming from 100’s of our customers. This is a legacy use case. Can you atleast point me in the right direction if I have to attempt doing this in your library? |
Beta Was this translation helpful? Give feedback.
-
Excellent work! However, in the variant use case of csv, I have the use case to load huge "fixed width columns structured text file", parse them and run provided action code for each line loaded in parallel. I believe your library can be used to do this most efficiently. Do you have any advice or time to put such a library in place?
Beta Was this translation helpful? Give feedback.
All reactions