awk(1)

 Documents

 3 views
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
System Administration Course Notes #6 awk The awk program is similar to sed in that you specify patterns to match in a text file and what you want to have happen to the lines of a file that match those patterns. The awk program (named after its authors, Aho, Kernighan and Weinberger) goes beyond what sed can do though in that awk allows you to 1. specify particular fields of a matching line to manipulate and/or output 2. use variables to store values temporarily 3. perform calculations on those
Share
Tags
Transcript
  System AdministrationCourse Notes #6awk  The awk program is similar to sed in that you specify patterns to match in a text file and whatyou want to have happen to the lines of a file that match those patterns. The awk program(named after its authors, Aho, Kernighan and Weinberger) goes beyond what sed can do thoughin that awk allows you to1.specify particular fields of a matching line to manipulate and/or output2.use variables to store values temporarily3.perform calculations on those variables.Another difference between sed and awk is that awk expects your text field to have rows that areseparated into fields, so we would not typically use awk on a text document that just containssentences (like an email message, or one of your answer files), but instead on a file likecourses.dat, where the items in each row represent some category (like a database relation). If you want to experiment with the files, copy ~foxr/addresses.txt, ~foxr/payroll.dat,~foxr/payroll2.dat to your home directory. Simple awk commands A simple awk command will look something like this:awk ‘/pattern/ { … }’ filenameThe pattern is a string of characters to match, a regular expression if desired, and the … consistsof one or more statements that specifies what to do with or about the matched line. The typicalcommand will be to print something out using a print statement.Like sed, awk looks line-by-line for a match and if found, performs the operation in { } marks.The print statement allows you to print values found in the line as well as literal values such asstrings. To specify an item in the line, use $n to indicate the field that you want to print where nis an integer like 1 or 2. For instance, awk ‘/KY/ { print $1,$2 }’ addresses.txt  will print the first and second fields of each line in the file addresses.txt that contains the stringKY. The , used in between $1 and $2 says to insert a delimiter, in this case, a space. If weinstead delimit with a space between $1 and $2, awk outputs the two fields without a space between them! So you want to use the comma as a way to insert space. Another option is toinsert your own space using “ ”as in awk ‘/KY/ { print $1 “ ” $2 }’ addresses.txt  We can use either ‘’ or “” to surround the awk statement such as in awk “/KY/ {print $1$2}” addresses.txt but this would not work if we wanted to insert a space using “ ”, so itis more common to use ‘ ’ to surround the awk statement. That is, we will use ‘’ if the printinstruction includes “” because you can’t do awk “/KY/ {print “KY resident”}” addresses.txt.    What is a field? We use awk on formatted files where items on a line are delimited, commonly by a tab (but they can be delimited by a space instead). The computers.txt file is an examplewhere fields are office building, office number, last name, and computers in their office. Wewould reference these fields as $1, $2, $3, $4 (although some people had more than 1 computer,so some lines might have a $5). Use $0 to reference the entire line (all fields).Examples: awk ‘/ST/ {print $1 $2 $3}’ computers.txtawk ‘/ 3[0-9]*/ {print $0}’ computers.txtawk ‘/, W/ {print $1, $2, $3, $4, $5}’ computers.txt The first one should be easy to understand. The second one finds all lines where there is anumber that starts with a 3 (3 rd floor) and outputs their entire line. The last one finds any linethat has a comma followed by a space followed by a W, that is, last name starts with W. This prints all fields, however if you look at the file, one of the W’s only has 1 computer. What isoutput for $5 if there is no fifth field? Nothing. Forms of Comparisons Patterns specified can be simple text to match, or a regular expressions. If you want to matchfrom the computers.txt file anyone with a Linux or Unix computer, you might use /n[iu]x/ whichappears in Li nux and U nix . For the addresses.txt file, if you wanted to find anyone who lived ineither KY or KS, you could match on /K[SY]/. In addition, ^ and $ can be used in your expression. You cannot use {n} however. To match when the pattern is not found in a line,use ! as in awk ‘!/A/ {print $1 “ does not have an A”}’ foobar.txt Use !/^…/ to make sure that a line does not start with the given pattern as in !/^S/ to find any linethat does not start with an S. Similarly, ‘!/pattern$/’ finds all lines where the pattern does notend the given line. The following outputs first and last names of people from the addresses.txtfile that do not have a 41*** zip code. awk ‘!/41[0-9][0-9][0-9]$/ {print $1,$2}’ addresses.txt  Notice how we used [0-9][0-9][0-9] instead of [0-9]{3}. We could have also used !/41…$/.Aside from matching a /pattern/, awk allows for comparisons of field values against values.These comparisons use the relational operators (<, >, = =, !=, <=, >=). As with the printstatement, you reference a field’s value using $n where n is the field number. For instance, to print all of the courses that are 3 credit hours from your courses.txt file, you would use awk ‘$3 == 3 { print $1 }’ courses.txt  Notice that you could not do this simply by using /3/ as the pattern this will match any 3(including say CIT370 or a course taken in Spring2003). To print the course, semester, andnumber of hours for all courses more than 1 hour, you would use awk ‘$3 > 1 {print $1 “\t” $2 “\t” $3 }’ courses.txt  Notice the use of “\t” in the print command to separate the fields with a tab. You could also justuse “ ” to separate the fields with a space.Another aspect of awk that differs from sed is the ability to perform calculations in the action { } portion of the command. For instance, if a file contained payroll information where field 2represented hours worked and field 3 was the wages, you could compute pay as $2 * $3 and  either store this in a variable, or print out the information as in { print “Pay: ” $2*$3 } or { pay =$2 * $3}.Let’s combine the use of calculations and comparisons in a more elaborate example using the payroll.dat file. This file lists the person’s last name, number of hours worked and hourly wagesas three fields on each line. We could use the following to output the pay of anyone who earnedovertime. awk ‘$2 > 40 { print $1 “\t $” 40 * $3 +($2 – 40) * $3 * 1.5 }’ payroll.dat This statement compares the hours field ($2) with 40 and if greater, then computes the overtime pay (40 * normal wages, plus the hours over 40 * wages * 1.5). Notice that this will only outputthe pay for people who have earned overtime. To make this print out everyone’s pay, we coulddo the following awk ‘$2 > 40 { print $1 “\t $” 40 * $3 +($2 – 40) * $3 * 1.5 }$2 <= 40 {print $1 “\t $” $2 * $3}’payroll.dat In the next section, we will improve on the logic above by using an if-then-else statement,similar to other programming languages.You can combine conditions by using && or ||, similarly to Java. For instance, to find anyemployee who worked fewer than 35 hours and earns more than $20 per hour, you might use awk ‘$2 < 35 && $3 > 20 {print $1}’ payroll.dat You can also combine a condition with a regular expression. Imagine our payroll.dat file alsoincluded the state as a fourth field (payroll2.dat). If you wanted to find all employees whoworked overtime and lived in KY, you could use awk ‘/KY/ && $2>40 {print $1}’ payroll2.dat or  awk ‘$4==“KY” && $2>40 {print $1}’ payroll2.dat Multiple Patterns What happens if you want to perform different operations for different patterns? We already sawan example of this when we computed the pay of all individuals in the payroll.dat file by havingtwo different conditions and actions ($2 > 40 and $2 <= 40). Here is the example repeated: awk ‘$2 > 40 {print $1 “\t $” ($2-40)*$3*1.5+40*$3}$2 <= 40 {print $1 “\t $” $2*$3}’ payroll.dat Imagine instead that anyone who works overtime and lives in KY gets double time instead of time and a half (1.5). For this example, we have three possible conditions (overtime and KY,overtime, normal) and three actions. Our code becomes: awk ‘/KY/&&$2>40 {print $1 “\t$” 40*$3+($2-40)*$3*2}$2>40 { print $1 “\t$” 40*$3+($2–40)*$3*1.5}$2<=40 {print $1 “\t$” $2*$3}’ payroll2.dat  The generic form of awk isawk ‘/pattern1/ {pattern1 action}/pattern2/ {pattern2 action}/pattern3/ {pattern3 action}…/lastpattern/ {lastpattern action}’ filenameHere is a simple but stupid example using your addresses.txt file: awk ‘/OH/ {print $1 “, go Buckeyes!”}/KY/ {print $1 “, go Wildcats!”}/IN/ {print $1 “, go Hoosiers!”}’ addresses.txt Another way to perform one of two possible operations is to use an if-else statement. This will be described later. Variables in awk  So far, all of our actions have performed print commands. We can also use variables andassignment statements, just as in a Java program. In awk, we do not have to declare variables or initialize them (unless we want to initialize a variable to a value other than 0). We just startusing them as needed. As a simple example, let’s assume that you want to determine the totalnumber of CIT credits that you are earning in Spring 2009 from the courses.dat file. You mightuse the following awk command: awk ‘/CIT/ && /Spring2009/ {hours+=$3}’ courses.dat or  awk ‘/CIT/ && $2==“Spring2009” {hours+=$3}’ courses.dat  NOTE: if you are unfamiliar with the notation +=, this is the same as hours = hours + $3.Here, for every CIT course in Spring 09, we are adding the number of hours of the given courseto the variable hours . Notice that if we wanted to count the number of CIT hours earned in all of 2008, for instance, we would have to use awk ‘/CIT/ && /2008/ {hours+=$3}’ courses.dat  but could not use the second version where we have $2==“…” because the item in “” is takenliterally, not as a regular expression to match. So while we can match /2008/, we could not do$2==“2008” because that would not literally match the second field (the second field isSpring2008 or Summer2008 or Fall2008). We could use: awk ‘/CIT/ && ($2==“Spring2008 || $2==“Summer2008” || $2==“Fall2008”) {hours+=$3}’ courses.dat Although now we are getting pretty complicated with our conditions. Notice that in all of these examples, we are not outputting the value of hours, so while we arecomputing the number of hours, we never get to see what is computed! We could change our action to be {hours+=$3; print hours} but this would print the number of hours for every line thatmatched (so we would get hours to print out several times). For instance, if your file has CIT370 and CIT 380 during Spring 2009, you would get output of 3 followed by 6 because awk would match both lines, the first time hours starts at 0 and becomes 3 and is printed out, and for 
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks