on March 21, 2016. in Development. A 1 minute read.
Finding duplicate lines from a CSV file is something I have to do from time to time, yet not on a regular enough basis to remember it all. Plus, I’m trying to blog more often.
cut -d, -f1 file.csv | tr -d '"' | sort | uniq -dc
cut
to split the lines at the commas and select the first field, then tr
to delete any double quotes that encloses the field, then sort
and finally with uniq
to show only the duplicated lines and to prefix every line with the count of occurrences.