Sorting Lists with more than 1,1M Rows

Recently I encountered a very weird problem. My wife is working with satellite data producing huge text files with numbers seperated by a space. The total amount of rows exceeding the 1,1 million. So, how we can sort these long huge lists based on multiple criteria?

If you try LibreOffice you will notice very fast, that the max number of rows are 1,048,576. Anything existing beyond that number, its lost. You can always of course split the list but then you can’t simple sort the numbers. And beside this, the LibreOffice is having a stupid limitation to only 3 criteria.

The solution is called «Use the damn terminal!» Actually the command is «sort» and with few parameters, you can get the whole file ready, with your values sorted in less than few seconds.

How? Lets say my file is called foo.txt, and is having 5 columns separated by a space. You want to short this file first by the 5th column and then by the 2nd, 3rd and 4th.

$sort -k5n,5 -k2n,2 -k3n,3 -k4n,4 foo.txt > foo_new.txt

Bam… Done!

Mandatory arguments to long options are mandatory for short options too.

-b–ignore-leading-blanks ignore leading blanks
consider only blanks and alphanumeric characters
fold lower case to upper case characters
compare according to general numerical value
consider only printable characters
compare (unknown) < `JAN” < … < `DEC”
compare according to string numerical value
reverse the result of comparisons

Other options:

check whether input is sorted; do not sort
start a key at POS1, end it at POS 2 (origin 1)
merge already sorted files; do not sort
write result to FILE instead of standard output
stabilize sort by disabling last-resort comparison
use SIZE for main memory buffer
-t–field-separator=SEP use SEP instead of non- to whitespace transition
use DIR for temporaries, not $TMPDIR or /tmp multiple options specify multiple directories
with -c: check for strict ordering
otherwise: output only the first of an equal run
end lines with 0 byte, not newline
display this help and exit
output version information and exit
This entry was posted in code and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *