{"id":59,"date":"2013-04-13T11:28:00","date_gmt":"2013-04-13T08:28:00","guid":{"rendered":"https:\/\/noosphere.gr\/?p=59"},"modified":"2020-06-20T07:26:32","modified_gmt":"2020-06-20T04:26:32","slug":"sorting-lists-with-more-than-11m-rows","status":"publish","type":"post","link":"https:\/\/noosphere.gr\/index.php\/archives\/2013\/59","title":{"rendered":"Sorting Lists with more than 1,1M Rows"},"content":{"rendered":"<p>Recently I encountered a very weird problem. My wife is working with satellite data producing huge text files with numbers seperated by a space. The total amount of rows\u00a0exceeding\u00a0the 1,1 million. So, how we can sort these long huge lists based on multiple criteria?<\/p>\n<p>If you try LibreOffice you will notice very fast, that the max number of rows are 1,048,576. Anything existing beyond that number, its lost. You can always of course split the list but then you can\u2019t simple sort the numbers. And beside this, the LibreOffice is having a stupid limitation to only 3 criteria.<\/p>\n<p>The solution is called \u00abUse the damn terminal!\u00bb Actually the command is \u00absort\u00bb and with few parameters, you can get the whole file ready, with your values sorted in less than few seconds.<\/p>\n<p>How? Lets say my file is called foo.txt, and is having 5 columns\u00a0separated\u00a0by a space. You want to short this file first by the 5th column and then by the 2nd, 3rd and 4th.<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">$sort -k5n,5 -k2n,2 -k3n,3 -k4n,4 foo.txt &gt; foo_new.txt<\/pre>\n<p>Bam\u2026 Done!<\/p>\n<p>Mandatory arguments to long options are mandatory for short options too.<\/p>\n<dl compact=\"compact\">\n<dt><b>-b<\/b>,\u00a0<b>\u2013ignore-leading-blanks<\/b>\u00a0ignore leading blanks<\/dt>\n<dt><b>-d<\/b>,\u00a0<b>\u2013dictionary-order<\/b><\/dt>\n<dd><\/dd>\n<dd>consider only blanks and alphanumeric characters<\/dd>\n<dt><b>-f<\/b>,\u00a0<b>\u2013ignore-case<\/b><\/dt>\n<dd>fold lower case to upper case characters<\/dd>\n<dt><b>-g<\/b>,\u00a0<b>\u2013general-numeric-sort<\/b><\/dt>\n<dd>compare according to general numerical value<\/dd>\n<dt><b>-i<\/b>,\u00a0<b>\u2013ignore-nonprinting<\/b><\/dt>\n<dd>consider only printable characters<\/dd>\n<dt><b>-M<\/b>,\u00a0<b>\u2013month-sort<\/b><\/dt>\n<dd>compare (unknown) &lt; `JAN\u201d &lt; \u2026 &lt; `DEC\u201d<\/dd>\n<dt><b>-n<\/b>,\u00a0<b>\u2013numeric-sort<\/b><\/dt>\n<dd>compare according to string numerical value<\/dd>\n<dt><b>-r<\/b>,\u00a0<b>\u2013reverse<\/b><\/dt>\n<dd>reverse the result of comparisons<\/dd>\n<\/dl>\n<p>Other options:<\/p>\n<dl compact=\"compact\">\n<dt><b>-c<\/b>,\u00a0<b>\u2013check<\/b><\/dt>\n<dd>check whether input is sorted; do not sort<\/dd>\n<dt><b>-k<\/b>,\u00a0<b>\u2013key<\/b>=<i>POS1[<\/i>,POS2]<\/dt>\n<dd>start a key at POS1, end it at POS 2 (origin 1)<\/dd>\n<dt><b>-m<\/b>,\u00a0<b>\u2013merge<\/b><\/dt>\n<dd>merge already sorted files; do not sort<\/dd>\n<dt><b>-o<\/b>,\u00a0<b>\u2013output<\/b>=<i>FILE<\/i><\/dt>\n<dd>write result to FILE instead of standard output<\/dd>\n<dt><b>-s<\/b>,\u00a0<b>\u2013stable<\/b><\/dt>\n<dd>stabilize sort by disabling last-resort comparison<\/dd>\n<dt><b>-S<\/b>,\u00a0<b>\u2013buffer-size<\/b>=<i>SIZE<\/i><\/dt>\n<dd>use SIZE for main memory buffer<\/dd>\n<dt><b>-t<\/b>,\u00a0<b>\u2013field-separator<\/b>=<i>SEP<\/i>\u00a0use SEP instead of non- to whitespace transition<\/dt>\n<dt><b>-T<\/b>,\u00a0<b>\u2013temporary-directory<\/b>=<i>DIR<\/i><\/dt>\n<dd><\/dd>\n<dd>use DIR for temporaries, not $TMPDIR or \/tmp multiple options specify multiple directories<\/dd>\n<dt><b>-u<\/b>,\u00a0<b>\u2013unique<\/b><\/dt>\n<dd>with\u00a0<b>-c<\/b>: check for strict ordering<br \/>\notherwise: output only the first of an equal run<\/dd>\n<dt><b>-z<\/b>,\u00a0<b>\u2013zero-terminated<\/b><\/dt>\n<dd>end lines with 0 byte, not newline<\/dd>\n<dt><b>\u2013help<\/b><\/dt>\n<dd>display this help and exit<\/dd>\n<dt><b>\u2013version<\/b><\/dt>\n<dd>output version information and exit<\/dd>\n<\/dl>\n","protected":false},"excerpt":{"rendered":"<p>Recently I encountered a very weird problem. My wife is working with satellite data producing huge text files with numbers seperated by a space. The total amount of rows\u00a0exceeding\u00a0the 1,1 million. So, how we can sort these long huge lists &hellip; <a href=\"https:\/\/noosphere.gr\/index.php\/archives\/2013\/59\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[17],"tags":[26],"class_list":["post-59","post","type-post","status-publish","format-standard","hentry","category-code","tag-planet"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/posts\/59","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/comments?post=59"}],"version-history":[{"count":1,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/posts\/59\/revisions"}],"predecessor-version":[{"id":60,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/posts\/59\/revisions\/60"}],"wp:attachment":[{"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/media?parent=59"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/categories?post=59"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/noosphere.gr\/index.php\/wp-json\/wp\/v2\/tags?post=59"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}