Issue:
How to use two comma separated fields as a record identifier in Advanced Text Schema?
Solution:
For the solution to this issue, we use following example:
The sample file where the first two comma separated fields need to be used as record identifier:
ABCD111421,H,Field3,Field4,Field5
ABCD111421,F,Field3,Field4,Field5
ABCD111421,D,Field3,Field4,Field5
LVJK074313,A,Field3,Field4,Field5
LVJK074313,E,Field3,Field4,Field5
LVJK074313,C,Field3,Field4,Field5
SWXY016231,R,Field3,Field4,Field5
SWXY016231,O,Field3,Field4,Field5
SWXY016231,M,Field3,Field4,Field5
We can fetch the desired outcome by defining the regular expression in the Advanced Text Schema and following are steps to achieve it. Also check the attached sample source file and XSD for more details.
- Record Separator: \n
Records are separated in the input file by new line so \n is used as record separator in Windows and in Linux the record separator will be \r\n.
- Field Separator: ,(?![HFDAECROM])
In the definition of field separator "," is used as field separator and negative look ahead (?!) is used to check if "," is followed by any one of the characters (HFDAECROM) then it will not separate the field(consider it a single field) and will take it as a record identifier.
- Field Definition for Record Identifier:
[A-Z]{1,4}[0-9]{1,6}\,H
[A-Z]{1,4}[0-9]{1,6}\,F
[A-Z]{1,4}[0-9]{1,6}\,D
....... ............
....... ............
[A-Z]{1,4}[0-9]{1,9}\,M
We can determine the definition for Record Identifier as above where [A-Z] states that the first 4 characters of first field can contain the characters between A-Z where minimum occurrence of a character is 1 and maximum is 4 ({1,4}). Here [0-9] states that first field can contain the digits between 0 to 9 with minimum occurrence of 1 and maximum 6, and (\) is used to handle special character (,) between first and second field in order to use it as record identifier.
Comments
0 comments
Article is closed for comments.