The standard command-line tools available on Unix-based systems are wonderful, if occasionally inscrutable. This page contains some commands I found useful.
Note: I’m writing these commands on macOS and while they’re almost always identical across all Unix-based systems there are some (small) differences in the tools that come with the operating system. Some options from Linux may not be available on macOS, or viceversa.
You can read about the available options for any tool by running man:
man <command>
👉 The
manhelp pages can sometimes be hard to understand.tldris a community-driven effort to offer clearer examples of each command, so you might want to look into it.
Use the $'…' format to have \n be interpreted as a newline character.
my-command --text=$'Hello\nWorld'
I found these commands useful for poking around a large codebase.
This command sorts all JavaScript files in the current folder by their line count, from the largest file to the smallest.
find . -name '*.js' | xargs wc -l | sort -r
Example output:
92573 total
3203 js/first-file.js
2443 js/second-file.js
1858 js/third-file.js
...
How it’s built:
find all JavaScript files (*.js) in the current (.) folderxargs, to the line counter (wc)sort the line count report from largest to smallest (-r)For the command to work with spaces in file names, we need to make find and xargs work better together by using these options:
-print0 makes find print all the file names in one line, separated by null characters, instead of the default new-line character.-0 makes xargs assume that the inputs are separated by the null character instead of a white-space character (i.e. the new-line character used by default in find)The command now looks like this:
find . -name '*.js' -print0 | xargs -0 wc -l | sort -r
This command looks at code that matches the pattern “console.something()” and extracts all the different somethings.
find . -name '*.js' | xargs perl -nle 'print $1 if /console\.([a-z]+)\(/i' | sort -u
Example output:
error
info
log
time
timeEnd
trace
warn
How it’s built:
find all JavaScript files (*.js) in the current directory (.)perl command that extracts a RegExp patternsort the extracted patterns to show only -unique occurrences💡 To make the command work with files that contain spaces in their name, we’ll need to use the
-print0/-0combo again.
perl partUnix-based systems have another tool for matching patterns, called
grep. However,grepon macOS doesn’t have an option to extract just a part of a RegExp, so we need to useperlfor this.
Glossing over the -nle arguments to the perl command (about which you can read here), the code looks for the pattern:
/console\.([a-z]+)\(/i
And prints the first capturing group ([a-z]+) out of the regular expression.
Here, instead of finding the list of unique occurrences of a certain pattern, we count how many time a certain pattern occurs.
find . -name '*.js' | xargs perl -nle 'print $1 if /console\.([a-z]+)\(/i' | sort | uniq -c | sort -nr
The command will result in something like:
85 log
65 error
52 warn
19 time
17 timeEnd
9 info
6 trace
The find and perl parts are the same, but we’ve replaced sort -u with:
sort to sort the occurrences alphabetically, then-count the occurrences of unique patterns, thensort the patterns again by the number of occurrences (-numeric and -rreversed)This command is adapted from Software Design X-Rays by Adam Tornhill:
git log --name-only --diff-filter=M --format=format: | grep -ve '^$' | sort | uniq -c | sort -r
For this repo you’re reading, it gives us these results:
89 README.md
10 journal.md
7 typefaces.md
4 writing.md
2 oblique.md
2 ffmpeg.md
1 unix-cli.md
1 react.md
1 adobe.md
Let’s unpack how it’s built:
git partgit log shows us the commit history for the current branch in the repository.--format=format:, i.e. use a custom format that is the empty string, is a trick to remove the commit information (author, date, message, etc.) from the log.--name-only shows the files included in each commit.--diff-filter=M shows us only the file that have been Modified, thus it excludes files from commits where they’ve been Added or Removed.grep partThe git log will contain empty lines between the commits; we’ll exclude them from our count using grep.
We’ll use a regular expression (-e). The ^$ pattern (start of line immediately followed by end of line) matches any empty line, but by using the -v flag to invert the pattern we can pick only the lines that don’t match, i.e. are not empty.
sort / uniq partsort the lines-count the occurrences of unique linessort the lines again based on the count, -reversed (from largest count to lowest count)If we replace grep -ve '^$' with grep -e 'some pattern', we can limit our count to files whose names match a pattern. For example, targeting JavaScript files with grep -e '\.js$' — notice we’ve removed the -v (invert) flag. The full command becomes:
git log --name-only --diff-filter=M --format=format: | grep -e '\.js$' | sort | uniq -c | sort -r
git diff --stat origin/master HEAD | awk '{ print $3, $1 }' | sort -rn
The general formula is:
diff <( ... command 1 goes here ... ) <( ... command 2 goes here ... )
It works even with curl (the -s is for silent):
diff <(curl -s http://example.com/1) <(curl -s http://example.com/2)
Some static site generators’ multilanguage features work by creating separate Markdown files for each language, e.g. about.md for English and about.de.md for German.
To find which .md files don’t have their equivalent .de.md file:
join -v 1 \
<(find . -name "*.md" -not -name "*.de.md" | sort) \
<(find . -name "*.de.md" | sed -E "s/\.de\.md/\.md/" | sort)
The first find gets us the list of .md files in English (i.e. Markdowns that don’t end in .de.md).
The second find gets us the list of .md files in German (ending in *.de.md ), then we use sed to replace .de.md with .md.
Note: We
sortthe output of bothfindcommands, because thejoincommand expects it. But in this case, both being the output of afindcommand, it may not be needed?
The join -v 1 prints out all the lines in file 1 (English) which are not matched by a line in file 2 (German).
Now, let’s copy over the English version for German files we haven’t found:
join -v 1 \
<(find . -name "*.md" -not -name "*.de.md" | sort) \
<(find . -name "*.de.md" | sed -E "s/\.de\.md/\.md/" | sort) \
| sed "p;s/\.md/\.de\.md/" | xargs -n2 cp
We reach for sed once more to produce the the original line (with p;), followed by the same line with .md changed back to .de.md. With this input:
my-file.md
my-other-file.md
we get:
my-file.md
my-file.de.md
my-other-file.md
my-other-file.de.md
xargs takes the input two lines at a time (-n2) and uses them as the first, and the second argument to cp, respectively. Something like:
cp my-file.md my-file.de.md
cp my-other-file.md my-other-file.de.md
Alternatively we can use the -n option in cp, which only makes a copy of the file if the destination file doesn’t already exist, to avoid having to run two separate finds and a join:
find . -name "*.md" -not -name "*.de.md" | sed "p;s/\.md/\.de\.md/" | xargs -n2 cp -n
To work around the limitations of the format, I needed to convert hundreds of files from JSON to JS modules that export plain objects.
As a variation of the previous pattern, to rename (i.e. move) files matching a certain pattern:
find . -name "*.json" | sed "p;s/\.json/\.data\.js/" | xargs -n2 mv
export defaultWe want to change the files from:
sample.json
{
"some_key": "some_value"
}
to:
sample.data.js
module.exports = {
"some_key": "some_value"
}
For this, we’ll use sed to replace the content of the first line:
find . -name '*.data.js' | xargs -n1 sed -i "" '1s/^.*$/export default \{/'
-i flag instructs sed to make the substitution in-place (in the same file); for the BSD version of sed that ships with macOS, doing that without making backup files is done with -i "";1 applies the s(ubstitute) command to the first line in each file, and the ^.+$ regular expression matches the entire line.Note: when addressing specific lines, we must use the -n1 flag on xargs to have sed run on each individual file. We want:
sed file1
sed file2
sed file3
# etc.
Instead of:
sed file1 file2 file3 ...
Because the line addressing works cumulatively across all input files, meaning 1 only matches the first line in the first file.
import statementOne of the reasons to switch from JSON to JS was to make some strings amenable to localization. For that I needed to add an import statement at the beginning of each file:
import { t } from 'js/i18n';
// rest of the file.
A word of caution about the sed that comes with macOS:
\n charactersTo match the beginning of the first line in sed, we can use 1s/^/.../. We’d like to add our import statement there, followed by a couple of newline breaks. At this point, I was convinced there’s no way to coerce macOS sed to insert newlines, so I’m thinking: MacBook keyboards come with a suprinsingly-handy § button, can we work around the \n problem by using the separate tr program? Something like:
sed "1s#^#import { t } from 'js/i18n';§§#" my-file.data.js | tr § '\n'
There’s a lot going on, so let’s unpack it. For the sed part:
1 addresses the first line in the file;s is the substitute command, used here with the # character instead of the customary / character (you can use any other character, for that matter), because we have the latter as part of our replacemenet text and it makes things a bit easier to read;^ matches the beginning of the line;§§ which are supposed to become a couple of newline characters.Since we want to pipe the sed output to the tr command — which will replace each occurrence of the § character with \n — we’re not using the -i (in-place) flag like before.
And here’s our first roadblock. If we were to pipe the output back to the original file (like -i did before):
sed "1s#^#...§§#" my-file.data.js | tr § '\n' > my-file.data.js
We’d notice that my-file.data.js ends up empty. That’s because redirecting stdout to our file via the > operator immediately opens (and truncates) our file before sed even has the chance to read it.
There’s a command called sponge that you need to install separately (with brew install moreutils) for this exact purpose:
sed "1s#^#...§§#" my-file.data.js | tr § '\n' | sponge my-file.data.js
But using sponge introduces a new dependency, surely there must be some sort of POSIX command, or a combination thereof, to obtain a similar result. tee sounds like it would allow you to write back to the original file:
sed "1s#^#...§§#" my-file.data.js | tr § '\n' | tee my-file.data.js
But the way tee works makes it an unreliable replacement for sponge, and it does seem like vanilla alternatives revolve around writing to a temporary file, then mv-ing it over the original file. Ugh.
We’re already stuck, and we haven’t even figured out how to make this idea work on a whole batch of files, since running multiple commands with xargs entails writing the compound command as a string and running it with sh, making the entire command something along the lines of:
find . -name '*.data.js' | xargs -n1 -I__FILE__ sh -c "sed \"1s#^#import { t } from 'js/i18n';§§#\" __FILE__ | tr § '\n' | sponge __FILE__"
Yikes!
Sometimes you read the wrong Stack Overflow answer, or read the right answer poorly, and off you go on a false premise.
It turns out you actually can insert \n characters with sed on macOS?! Our struggle resolves to this pithy one- (well, technically two-) liner:
find . -name '*.txt' | xargs -n1 \
sed -i "" $'1s#^#import { t } from \'js/i18n\';\\\n\\\n#'
Using $'...' makes the whole sed command a C-style string which replaces \\ with \ and \n with a literal newline, making it equivalent to:
1s#^#import { t } from 'js/i18n';\
\
#
For the sake of completeness, you might also want to take a look at using
catandechoto prepend a line.
Okay, now that we’ve imported the t function, let’s apply it to some strings in our JS file; that is, to turn this:
import { t } from 'js/i18n';
module.exports = {
"some_key": "some_value"
}
to this:
import { t } from 'js/i18n';
module.exports = {
"some_key": t`some_value`
}
This is done by capturing some_value and using a back-reference to wrap t around it:
find . -name '*.txt' | xargs -n1 \
sed -E 's/("mykey"): "([^"]*)"/\1: t`\2`/g'
By default, sed works with BRE (basic regular expressions), which lack some of the comforts of their modern counterparts. Running it with the -E flag interprets regular expressions as extended.
Note: we match
some_valueby[^"]*, i.e. a sequence of zero or more non-quote characters. This is not entirely correct, as JavaScript strings can contain (escaped) quote characters, or span multiple lines.