encodargs for gawk ------------------ Manuel Collado, http://mcollado.z15.es December, 2012 - Public domain Overview -------- This is a demo code that shows how to process input files with different character encodings in the same gawk invocation. The iconv utility is used to convert the encoding on the fly. Operation --------- File arguments can just be the usual file names or have an optional ::encoding suffix appended to it. Encodargs replaces each special file::encoding argument with the file name of a temporary copy with the encoding converted to the internal encoding used by gawk to process string values. When the temporary copy is processed, the original file name is restored in the FILENAME predefined variable. A summary of each reencoding action is printed on /dev/stderr. Usage ----- Just put '-f encodargs.awk' as the first program parameter in the command line. Instead of writing: gawk -f myprogram.awk ... filename ... You can say: gawk -f encodargs.awk -f myprogram.awk ... filename::utf-8 ... To further simplify usage, a wrapper shell script 'egawk' is provided. The command invocation can by just: egawk -f myprogram.awk ... filename::utf-8 ... Examples -------- gawk -f encodargs.awk -f list.awk sample.cp1252.txt::cp1252 egawk -f list.awk sample.utf8.txt::utf-8 sample.cp1252.txt::cp1252 Limitations ----------- - encodargs relies on LC_* or LANG environment variables to gess the encoding that will be used internally by gawk. - The iconv utility must be installed on the system. - The encoding name must be supported by iconv. - The original files are fully duplicated even if only some part needs to be processed.