The standard command-line tools available on Unix-based systems are wonderful, if occasionally inscrutable. This page contains some commands I found useful.
Note: I’m writing these commands on macOS and while they’re almost always identical across all Unix-based systems there are some (small) differences in the tools that come with the operating system. Some options from Linux may not be available on macOS, or viceversa.
You can read about the available options for any tool by running man
:
man <command>
👉 The
man
help pages can sometimes be hard to understand.tldr
is a community-driven effort to offer clearer examples of each command, so you might want to look into it.
Use the $'…'
format to have \n
be interpreted as a newline character.
my-command --text=$'Hello\nWorld'
I found these commands useful for poking around a large codebase.
This command sorts all JavaScript files in the current folder by their line count, from the largest file to the smallest.
find . -name '*.js' | xargs wc -l | sort -r
Example output:
92573 total
3203 js/first-file.js
2443 js/second-file.js
1858 js/third-file.js
...
How it’s built:
find
all JavaScript files (*.js
) in the current (.
) folderxargs
, to the line counter (wc
)sort
the line count report from largest to smallest (-r
)For the command to work with spaces in file names, we need to make find
and xargs
work better together by using these options:
-print0
makes find
print all the file names in one line, separated by null characters, instead of the default new-line character.-0
makes xargs
assume that the inputs are separated by the null character instead of a white-space character (i.e. the new-line character used by default in find
)The command now looks like this:
find . -name '*.js' -print0 | xargs -0 wc -l | sort -r
This command looks at code that matches the pattern “console.something()” and extracts all the different something
s.
find . -name '*.js' | xargs perl -nle 'print $1 if /console\.([a-z]+)\(/i' | sort -u
Example output:
error
info
log
time
timeEnd
trace
warn
How it’s built:
find
all JavaScript files (*.js
) in the current directory (.
)perl
command that extracts a RegExp patternsort
the extracted patterns to show only -u
nique occurrences💡 To make the command work with files that contain spaces in their name, we’ll need to use the
-print0
/-0
combo again.
perl
partUnix-based systems have another tool for matching patterns, called
grep
. However,grep
on macOS doesn’t have an option to extract just a part of a RegExp, so we need to useperl
for this.
Glossing over the -nle
arguments to the perl
command (about which you can read here), the code looks for the pattern:
/console\.([a-z]+)\(/i
And prints the first capturing group ([a-z]+
) out of the regular expression.
Here, instead of finding the list of unique occurrences of a certain pattern, we count how many time a certain pattern occurs.
find . -name '*.js' | xargs perl -nle 'print $1 if /console\.([a-z]+)\(/i' | sort | uniq -c | sort -nr
The command will result in something like:
85 log
65 error
52 warn
19 time
17 timeEnd
9 info
6 trace
The find
and perl
parts are the same, but we’ve replaced sort -u
with:
sort
to sort the occurrences alphabetically, then-c
ount the occurrences of uniq
ue patterns, thensort
the patterns again by the number of occurrences (-n
umeric and -r
reversed)This command is adapted from Software Design X-Rays by Adam Tornhill:
git log --name-only --diff-filter=M --format=format: | grep -ve '^$' | sort | uniq -c | sort -r
For this repo you’re reading, it gives us these results:
89 README.md
10 journal.md
7 typefaces.md
4 writing.md
2 oblique.md
2 ffmpeg.md
1 unix-cli.md
1 react.md
1 adobe.md
Let’s unpack how it’s built:
git
partgit log
shows us the commit history for the current branch in the repository.--format=format:
, i.e. use a custom format that is the empty string, is a trick to remove the commit information (author, date, message, etc.) from the log.--name-only
shows the files included in each commit.--diff-filter=M
shows us only the file that have been M
odified, thus it excludes files from commits where they’ve been A
dded or R
emoved.grep
partThe git log
will contain empty lines between the commits; we’ll exclude them from our count using grep
.
We’ll use a regular expression (-e
). The ^$
pattern (start of line immediately followed by end of line) matches any empty line, but by using the -v
flag to invert the pattern we can pick only the lines that don’t match, i.e. are not empty.
sort
/ uniq
partsort
the lines-c
ount the occurrences of uniq
ue linessort
the lines again based on the count, -r
eversed (from largest count to lowest count)If we replace grep -ve '^$'
with grep -e 'some pattern'
, we can limit our count to files whose names match a pattern. For example, targeting JavaScript files with grep -e '\.js$'
— notice we’ve removed the -v
(invert) flag. The full command becomes:
git log --name-only --diff-filter=M --format=format: | grep -e '\.js$' | sort | uniq -c | sort -r
git diff --stat origin/master HEAD | awk '{ print $3, $1 }' | sort -rn
The general formula is:
diff <( ... command 1 goes here ... ) <( ... command 2 goes here ... )
It works even with curl
(the -s
is for silent):
diff <(curl -s http://example.com/1) <(curl -s http://example.com/2)
Some static site generators’ multilanguage features work by creating separate Markdown files for each language, e.g. about.md
for English and about.de.md
for German.
To find which .md
files don’t have their equivalent .de.md
file:
join -v 1 \
<(find . -name "*.md" -not -name "*.de.md" | sort) \
<(find . -name "*.de.md" | sed -E "s/\.de\.md/\.md/" | sort)
The first find
gets us the list of .md
files in English (i.e. Markdowns that don’t end in .de.md
).
The second find
gets us the list of .md
files in German (ending in *.de.md
), then we use sed
to replace .de.md
with .md
.
Note: We
sort
the output of bothfind
commands, because thejoin
command expects it. But in this case, both being the output of afind
command, it may not be needed?
The join -v 1
prints out all the lines in file 1
(English) which are not matched by a line in file 2
(German).
Now, let’s copy over the English version for German files we haven’t found:
join -v 1 \
<(find . -name "*.md" -not -name "*.de.md" | sort) \
<(find . -name "*.de.md" | sed -E "s/\.de\.md/\.md/" | sort) \
| sed "p;s/\.md/\.de\.md/" | xargs -n2 cp
We reach for sed
once more to produce the the original line (with p;
), followed by the same line with .md
changed back to .de.md
. With this input:
my-file.md
my-other-file.md
we get:
my-file.md
my-file.de.md
my-other-file.md
my-other-file.de.md
xargs
takes the input two lines at a time (-n2
) and uses them as the first, and the second argument to cp
, respectively. Something like:
cp my-file.md my-file.de.md
cp my-other-file.md my-other-file.de.md
Alternatively we can use the -n
option in cp
, which only makes a copy of the file if the destination file doesn’t already exist, to avoid having to run two separate find
s and a join
:
find . -name "*.md" -not -name "*.de.md" | sed "p;s/\.md/\.de\.md/" | xargs -n2 cp -n
To work around the limitations of the format, I needed to convert hundreds of files from JSON to JS modules that export plain objects.
As a variation of the previous pattern, to rename (i.e. move) files matching a certain pattern:
find . -name "*.json" | sed "p;s/\.json/\.data\.js/" | xargs -n2 mv
export default
We want to change the files from:
sample.json
{
"some_key": "some_value"
}
to:
sample.data.js
module.exports = {
"some_key": "some_value"
}
For this, we’ll use sed
to replace the content of the first line:
find . -name '*.data.js' | xargs -n1 sed -i "" '1s/^.*$/export default \{/'
-i
flag instructs sed
to make the substitution in-place (in the same file); for the BSD version of sed
that ships with macOS, doing that without making backup files is done with -i ""
;1
applies the s
(ubstitute) command to the first line in each file, and the ^.+$
regular expression matches the entire line.Note: when addressing specific lines, we must use the -n1
flag on xargs
to have sed
run on each individual file. We want:
sed file1
sed file2
sed file3
# etc.
Instead of:
sed file1 file2 file3 ...
Because the line addressing works cumulatively across all input files, meaning 1
only matches the first line in the first file.
import
statementOne of the reasons to switch from JSON to JS was to make some strings amenable to localization. For that I needed to add an import statement at the beginning of each file:
import { t } from 'js/i18n';
// rest of the file.
A word of caution about the sed
that comes with macOS:
\n
charactersTo match the beginning of the first line in sed, we can use 1s/^/.../
. We’d like to add our import statement there, followed by a couple of newline breaks. At this point, I was convinced there’s no way to coerce macOS sed
to insert newlines, so I’m thinking: MacBook keyboards come with a suprinsingly-handy §
button, can we work around the \n
problem by using the separate tr
program? Something like:
sed "1s#^#import { t } from 'js/i18n';§§#" my-file.data.js | tr § '\n'
There’s a lot going on, so let’s unpack it. For the sed
part:
1
addresses the first line in the file;s
is the substitute command, used here with the #
character instead of the customary /
character (you can use any other character, for that matter), because we have the latter as part of our replacemenet text and it makes things a bit easier to read;^
matches the beginning of the line;§§
which are supposed to become a couple of newline characters.Since we want to pipe the sed
output to the tr
command — which will replace each occurrence of the §
character with \n
— we’re not using the -i
(in-place) flag like before.
And here’s our first roadblock. If we were to pipe the output back to the original file (like -i
did before):
sed "1s#^#...§§#" my-file.data.js | tr § '\n' > my-file.data.js
We’d notice that my-file.data.js
ends up empty. That’s because redirecting stdout
to our file via the >
operator immediately opens (and truncates) our file before sed
even has the chance to read it.
There’s a command called sponge
that you need to install separately (with brew install moreutils
) for this exact purpose:
sed "1s#^#...§§#" my-file.data.js | tr § '\n' | sponge my-file.data.js
But using sponge
introduces a new dependency, surely there must be some sort of POSIX command, or a combination thereof, to obtain a similar result. tee
sounds like it would allow you to write back to the original file:
sed "1s#^#...§§#" my-file.data.js | tr § '\n' | tee my-file.data.js
But the way tee
works makes it an unreliable replacement for sponge
, and it does seem like vanilla alternatives revolve around writing to a temporary file, then mv
-ing it over the original file. Ugh.
We’re already stuck, and we haven’t even figured out how to make this idea work on a whole batch of files, since running multiple commands with xargs
entails writing the compound command as a string and running it with sh
, making the entire command something along the lines of:
find . -name '*.data.js' | xargs -n1 -I__FILE__ sh -c "sed \"1s#^#import { t } from 'js/i18n';§§#\" __FILE__ | tr § '\n' | sponge __FILE__"
Yikes!
Sometimes you read the wrong Stack Overflow answer, or read the right answer poorly, and off you go on a false premise.
It turns out you actually can insert \n
characters with sed
on macOS?! Our struggle resolves to this pithy one- (well, technically two-) liner:
find . -name '*.txt' | xargs -n1 \
sed -i "" $'1s#^#import { t } from \'js/i18n\';\\\n\\\n#'
Using $'...'
makes the whole sed command a C-style string which replaces \\
with \
and \n
with a literal newline, making it equivalent to:
1s#^#import { t } from 'js/i18n';\
\
#
For the sake of completeness, you might also want to take a look at using
cat
andecho
to prepend a line.
Okay, now that we’ve imported the t
function, let’s apply it to some strings in our JS file; that is, to turn this:
import { t } from 'js/i18n';
module.exports = {
"some_key": "some_value"
}
to this:
import { t } from 'js/i18n';
module.exports = {
"some_key": t`some_value`
}
This is done by capturing some_value
and using a back-reference to wrap t
around it:
find . -name '*.txt' | xargs -n1 \
sed -E 's/("mykey"): "([^"]*)"/\1: t`\2`/g'
By default, sed
works with BRE (basic regular expressions), which lack some of the comforts of their modern counterparts. Running it with the -E
flag interprets regular expressions as extended.
Note: we match
some_value
by[^"]*
, i.e. a sequence of zero or more non-quote characters. This is not entirely correct, as JavaScript strings can contain (escaped) quote characters, or span multiple lines.