Here are some of the tools I use to process and clean data from all manner of customers:
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
See other episodes for great sed information. I like to remove DOS end of line and end of file characters:
sed -i 's/
//g' *.txt
or
sed -i 's/\r//g' *.txt
:vim /pattern/ ##
):bufdo %s/pattern/replace/ge | update
)Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
The HPR Website Design is released to the Public Domain.