Find duplicate lines from a CSV

on March 21, 2016. in Development. A 1 minute read.

Finding duplicate lines from a CSV file is something I have to do from time to time, yet not on a regular enough basis to remember it all. Plus, I’m trying to blog more often.

cut -d, -f1 file.csv | tr -d '"' | sort | uniq -dc

cut to split the lines at the commas and select the first field, then tr to delete any double quotes that encloses the field, then sort and finally with uniq to show only the duplicated lines and to prefix every line with the count of occurrences.

Tags: linux, shell.