rmoff

October 19, 2010

awk – split a fixed width file into separate files named on content

Filed under: obiee — rmoff @ 13:31

More of a unix thing than DW/BI this post, but I have a beard so am semi-qualified….

The requirement was to improve the performance of some ODI processing that as part of its work was taking one huge input file, and splitting it into chunks based on content in the file. To add some (minor) spice the file was fixed width with no deliminators, so the easy awk answers that I found on google weren’t applicable.

Source file, to be split based on cols 14-16:

0000000010069583000A
0000000010083583000A
0000000011600583000B
0000000011936584000D
0000000010101584000E
0000000010903584000G
0000000010517585000Q

Output files:
prefix.583.dat

0000000010069583000A
0000000010083583000A
0000000011600583000B

prefix.584.dat

0000000011936584000D
0000000010101584000E
0000000010903584000G

prefix.585.dat

0000000010517585000Q

So without further ado, my little command-line gem:

awk '{fn=substr($0,14,3);print > ("prefix." fn ".dat")}' file.dat

Where:

  • the output filename is made up of characters from column 14 in the input record for 3 characters
  • file.dat is the input filename

For extra brownie points, incorporate components of the input filename in the output filenames:

awk 'BEGIN {dt=substr(FILENAME,6,8)} {fn=substr($0,14,3);print > ("prefix." fn "." dt ".dat")}' file.20101018.dat

As a sidenote, this is an example of choosing the right tools for the job – when there’s simple commandline tools that can be scripted like this, using a heavyweight tool like ODI is overkill and can indeed land you with performance problems.

[edit]
Ted Dziuba has written a great article here on how unix utilities can often be the right tool for the job: Taco Bell Programming.
[/edit]

Advertisements

2 Comments

  1. Thanks for sharing.

    I will say: Choosing the right method that the right tool offers.

    Do you know which method/algorithm use awk and which method use the knowledge module of ODI to manipulate the files ?

    Cheers
    Nico

    Comment by Nicolas GERARD — October 19, 2010 @ 14:53

    • Hi Nico,
      I don’t know – I just got asked about awk as someone on the team who has a beard

      I’ll try and find out about ODI

      cheers, Robin.

      Comment by rnm1978 — October 19, 2010 @ 16:16


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.

%d bloggers like this: