For as much flack as Perl gets as a language, I constantly find myself returning to it to whip up small but effective utilities for text processing and regular expressions. Over time, I built up a small scratch pad of tiny scripts and one liners that I would frequently reference to inspect text files, cut and tidy up data, and do a little housekeeping across files and directories. Going over some old files the other day, I realized that these little one liners really came in handy more than a few times, so I figured I’d share some here.
1. Syntax / Compile check with the -c flag.
It’s surprising how many people use Perl and aren’t aware of the -c
flag to “check” the syntax of your program and verify that the program is free of syntax errors. This can save you a tremendous amount of time and headaches verifying this ahead of time before you run your program. Essentially, you want to get in the habit of frequently saving your script and running:
perl -c myscript.pl
Prior to running your code. If your syntax is clean and the code will execute, Perl will tell you it is OK. NOTE: As we all know, just because the code will run, does not mean it will do what you want it to do! That is up to you to verify :).
2. Inspect a delimited file, inline from the Command Line.
Say you have a tab delimited text file of demographic information, you have some questions about the data it contains. For example, how many customers in a file are in the state of New York? How many students have a major of “BIOLOGY”? Normally, you could load the file into a database or Excel and inspect it there, or you could use Perl and query the file inline, from the command line.
perl -F’\t’ -lane ‘EXPRESSION’
In this example, we are using the -F flag to specify a delimiter for the input (in this case ‘\t’ for tab). The other flags, -lane, allow you to execute the code inline, by the compiler and add an implicit loop around your code to process it one file at a time. This might sound a bit confusing, so lets use a few examples. Say you have a file with some basic demographic information, such as:
ID FIRST_NAME LAST_NAME CITY STATE ZIP
1 JOHN DOE NEW YORK NY 12345
2 JERRY SEINFELD WASHINGTON DC 78901
3 GEORGE COSTANZA QUEENS NY 12346
4 FRANK LANDERS PHILADELPHIA PA 19010
5 TIM SANDERS MIAMI FL 12225
Going back to our example, if we wanted to see how many customers are in the state of New York, we would run:
perl -F’\t’ -lane ‘print if $F[3] eq "NY" ’ customers.tab | wc -l
In this example, our one liner reads the entire file into an array called @F, and our EXPRESSION in this case is telling the compiler to print all records where the fourth column (zero indexed) equals “NY” (New York customers). Then we just pipe the output to wc -l to count the records, and we get our answer directly from the command line. No databases or Excel required. Beautiful.
3. The -ne (execute) flag.
It’s much easier to learn and understand a few key Perl command line flags and how they work than it is to try to memorize cryptic looking Perl one liners without an understanding of how they work under the hood. Calling the -ne flag at the command line allows you to execute a Perl command or statement inline, via the command line. It is essentially equivalent to :
while (<>) { CODE }
The -n flag adds an implicit loop around your code statement, and the -e flag executes the CODE statement inline which is passed to it. So, for example, if you wanted to scan a file called some_file.txt print lines from a text file that were less than 60 characters long:
perl -ne 'print if length < 60' < some_file.txt
4. The -a (autosplit) flag.
The -a flag will automatically split your input into an array called @F. It is equivalent to:
while (<>) { @F = split /PATTERN/, $_; CODE }
By default it splits on whitespace, but you can modify the delimiter or pattern to split on for your input. For example, the code below will read in a csv file and split it into columns. Then we print all rows where the third column $F[2] equals 5:
perl -F/,/ -ane 'print if $F[2]==5' file.csv
5. Inspecting text files – print lines of a file matching a pattern.
perl -ne '/regex/ && print' < myfile.txt
This will read the file myfile.txt as input, and print out every line matching the pattern /regex/. You can conversely print lines that don't matching the pattern by:
perl -ne '!/regex/ && print' < myfile.txt
There is a very useful Stackoverflow thread that covers command line flags for Perl, with helpful examples. If your day to day work involves a lot of moving, searching, or cutting up flat files, building up a small toolkit of perl one liners and scripts can make your life much easier. Even if you're a language elitist who detests Perl (and I won't judge you if you are), I'd encourage you to check them out.