Recently I encountered a very weird problem. My wife is working with satellite data producing huge text files with numbers seperated by a space. The total amount of rows exceeding the 1,1 million. So, how we can sort these long huge lists based on multiple criteria?
If you try LibreOffice you will notice very fast, that the max number of rows are 1,048,576. Anything existing beyond that number, its lost. You can always of course split the list but then you can’t simple sort the numbers. And beside this, the LibreOffice is having a stupid limitation to only 3 criteria.
The solution is called «Use the damn terminal!» Actually the command is «sort» and with few parameters, you can get the whole file ready, with your values sorted in less than few seconds.
How? Lets say my file is called foo.txt, and is having 5 columns separated by a space. You want to short this file first by the 5th column and then by the 2nd, 3rd and 4th.
$sort -k5n,5 -k2n,2 -k3n,3 -k4n,4 foo.txt > foo_new.txt
Bam… Done!
Mandatory arguments to long options are mandatory for short options too.
- -b, –ignore-leading-blanks ignore leading blanks
- -d, –dictionary-order
- consider only blanks and alphanumeric characters
- -f, –ignore-case
- fold lower case to upper case characters
- -g, –general-numeric-sort
- compare according to general numerical value
- -i, –ignore-nonprinting
- consider only printable characters
- -M, –month-sort
- compare (unknown) < `JAN” < … < `DEC”
- -n, –numeric-sort
- compare according to string numerical value
- -r, –reverse
- reverse the result of comparisons
Other options:
- -c, –check
- check whether input is sorted; do not sort
- -k, –key=POS1[,POS2]
- start a key at POS1, end it at POS 2 (origin 1)
- -m, –merge
- merge already sorted files; do not sort
- -o, –output=FILE
- write result to FILE instead of standard output
- -s, –stable
- stabilize sort by disabling last-resort comparison
- -S, –buffer-size=SIZE
- use SIZE for main memory buffer
- -t, –field-separator=SEP use SEP instead of non- to whitespace transition
- -T, –temporary-directory=DIR
- use DIR for temporaries, not $TMPDIR or /tmp multiple options specify multiple directories
- -u, –unique
- with -c: check for strict ordering
otherwise: output only the first of an equal run - -z, –zero-terminated
- end lines with 0 byte, not newline
- –help
- display this help and exit
- –version
- output version information and exit