A quick fix for the ‘ascii’ codec can’t encode character error in Python and the csvkit tool

December 16, 2015 (Last Modified: March 15, 2020)

For a current project I need to migrate large volumes of CSV data into a relational database management system. The Python driven csvkit is the swiss army knife of CSV tools and very handy for this purpose. However, a few CSV files cause troubles when I tried to pipe the SQL CREATE statements I created with the csvkit tool to a file.

csvsql -i sqlite -d ';' -e 'utf8' --db-schema test_schema --table test_table inputfile.csv > output.sql
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 46: ordinal not in range(128)```


Adding the &#8211;verbose flag gives more clarity

csvsql -i sqlite -d ‘;’ -e ‘utf8’ –db-schema test_schema –table test_table inputfile.csv > output.sql Traceback (most recent call last): File “/usr/local/bin/csvsql”, line 9, in load_entry_point(‘csvkit==0.9.1’, ‘console_scripts’, ‘csvsql’)() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 161, in launch_new_instance utility.main() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 134, in main self.output_file.write('%s\n’ % sql.make_create_table_statement(sql_table, dialect=self.args.dialect)) UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\ufeff’ in position 46: ordinal not in range(128)```

Although the input file is already in UTF-8, the file is written in ASCII, which causes an error if Umlauts are included.
Adding the following code after the import commands at the top of the csvsql.py file sets the correct encoding for the output file in this Python script.

reload(sys)
sys.setdefaultencoding('utf-8')

This is rather a quick fix than an elegant solution, but it allows me to continue with my work.