Monday, February 11, 2008

Basic Commands

bc
A calculator program that handles arbitrary precision (very large) numbers. It is useful for doing any kind of calculation on the command-line. Its use is left as an exercise.
cal [[0-12] 1-9999]
Prints out a nicely formatted calender of the current month, a specified month, or a specified whole year. Try cal 1 for fun, and cal 9 1752, when the pope had a few days scrapped to compensate for round-off error.
cat [ ...]
Writes the contents of all the files listed to the screen. cat can join a lot of files together with cat ... > . The file will be an end-on-end concatenation of all the files specified.
clear
Erases all the text in the current terminal.
date
Prints out the current date and time. (The command time, though, does something entirely different.)
df
Stands for disk free and tells you how much free space is left on your system. The available space usually has the units of kilobytes (1024 bytes) (although on some other UNIX systems this will be 512 bytes or 2048 bytes). The right-most column tells the directory (in combination with any directories below that) under which that much space is available.
dircmp
Directory compare. This command compares directories to see if changes have been made between them. You will often want to see where two trees differ (e.g., check for missing files), possibly on different computers. Run man dircmp (that is, dircmp(1)). (This is a System 5 command and is not present on LINUX. You can, however, compare directories with the Midnight Commander, mc).
du
Stands for disk usage and prints out the amount of space occupied by a directory. It recurses into any subdirectories and can print only a summary with du -s . Also try du --max-depth=1 /var and du -x / on a system with /usr and /home on separate partitions. [See page [*].]
dmesg
Prints a complete log of all messages printed to the screen during the bootup process. This is useful if you blinked when your machine was initializing. These messages might not yet be meaningful, however.
echo
Prints a message to the terminal. Try echo 'hello there', echo $[10*3+2], echo `$[10*3+2]'. The command echo -e allows interpretation of certain backslash sequences, for example echo -e "\a", which prints a bell, or in other words, beeps the terminal. echo -n does the same without printing the trailing newline. In other words, it does not cause a wrap to the next line after the text is printed. echo -e -n "\b", prints a back-space character only, which will erase the last character printed.
exit
Logs you out.
expr
Calculates the numerical expression expression. Most arithmetic operations that you are accustomed to will work. Try expr 5 + 10 '*' 2. Observe how mathematical precedence is obeyed (i.e., the * is worked out before the +).
file
Prints out the type of data contained in a file. file portrait.jpg will tell you that portrait.jpg is a JPEG image data, JFIF standard. The command file detects an enormous amount of file types, across every platform. file works by checking whether the first few bytes of a file match certain tell-tale byte sequences. The byte sequences are called magic numbers. Their complete list is stored in /usr/share/magic. [The word ``magic'' under UNIX normally refers to byte sequences or numbers that have a specific meaning or implication. So-called magic numbers are invented for source code, file formats, and file systems.]
free
Prints out available free memory. You will notice two listings: swap space and physical memory. These are contiguous as far as the user is concerned. The swap space is a continuation of your installed memory that exists on disk. It is obviously slow to access but provides the illusion of much more available RAM and avoids the possibility of ever running out of memory (which can be quite fatal).
head [-n ]
Prints the first lines of a file or 10 lines if the -n option is not given. (See also tail below).
hostname []
With no options, hostname prints the name of your machine, otherwise it sets the name to .
kbdrate -r -d
Changes the repeat rate of your keys. Most users will like this rate set to kbdrate -r 32 -d 250 which unfortunately is the fastest the PC can go.
more
Displays a long file by stopping at the end of each page. Run the following: ls -l /bin > bin-ls, and then try more bin-ls. The first command creates a file with the contents of the output of ls. This will be a long file because the directory /bin has a great many entries. The second command views the file. Use the space bar to page through the file. When you get bored, just press Q. You can also try ls -l /bin | more which will do the same thing in one go.
less
The GNU version of more, but with extra features. On your system, the two commands may be the same. With less, you can use the arrow keys to page up and down through the file. You can do searches by pressing ?, and then typing in a word to search for and then pressing Enter.

lynx
Opens a URL [URL stands for Uniform Resource Locator--a web address.]at the console. Try lynx http://lwn.net/.
links
Another text-based web browser.
nohup &
Runs a command in the background, appending any output the command may produce to the file nohup.out in your home directory. nohup has the useful feature that the command will continue to run even after you have logged out. Uses for nohup will become obvious later.
sleep
Pauses for seconds. See also usleep.
sort
Prints a file with lines sorted in alphabetical order. Create a file called telephone with each line containing a short telephone book entry. Then type sort telephone, or sort telephone | less and see what happens. sort takes many interesting options to sort in reverse ( sort -r), to eliminate duplicate entries ( sort -u), to ignore leading whitespace ( sort -b), and so on. See the sort(1) for details.
strings [-n ]
Writes out a binary file, but strips any unreadable characters. Readable groups of characters are placed on separate lines. If you have a binary file that you think may contain something interesting but looks completely garbled when viewed normally, use strings to sift out the interesting stuff: try less /bin/cp and then try strings /bin/cp. By default strings does not print sequences smaller than 4. The -n option can alter this limit.
split ...
Splits a file into many separate files. This might have been used when a file was too big to be copied onto a floppy disk and needed to be split into, say, 360-KB pieces. Its sister, csplit, can split files along specified lines of text within the file. The commands are seldom used on their own but are very useful within programs that manipulate text.
tac [ ...]
Writes the contents of all the files listed to the screen, reversing the order of the lines--that is, printing the last line of the file first. tac is cat backwards and behaves similarly.
tail [-f] [-n ]
Prints the last lines of a file or 10 lines if the -n option is not given. The -f option means to watch the file for lines being appended to the end of it. (See also head above.)
uname
Prints the name of the UNIX operating system you are currently using. In this case, LINUX.
uniq
Prints a file with duplicate lines deleted. The file must first be sorted.
usleep
Pauses for microseconds (1/1,000,000 of a second).
wc [-c] [-w] [-l]
Counts the number of bytes (with -c for character), or words (with -w), or lines (with -l) in a file.
whatis
Gives the first line of the man page corresponding to , unless no such page exists, in which case it prints nothing appropriate.
whoami
Prints your login name.

Compressed Files

Files typically contain a lot of data that one can imagine might be represented with a smaller number of bytes. Take for example the letter you typed out. The word ``the'' was probably repeated many times. You were probably also using lowercase letters most of the time. The file was by far not a completely random set of bytes, and it repeatedly used spaces as well as using some letters more than others. [English text in fact contains, on average, only about 1.3 useful bits (there are eight bits in a byte) of data per byte.]Because of this the file can be compressed to take up less space. Compression involves representing the same data by using a smaller number of bytes, in such a way that the original data can be reconstructed exactly. Such usually involves finding patterns in the data. The command to compress a file is gzip , which stands for GNU zip. Run gzip on a file in your home directory and then run ls to see what happened. Now, use more to view the compressed file. To uncompress the file use gzip -d . Now, use more to view the file again. Many files on the system are stored in compressed format. For example, man pages are often stored compressed and are uncompressed automatically when you read them.

You previously used the command cat to view a file. You can use the command zcat to do the same thing with a compressed file. Gzip a file and then type zcat . You will see that the contents of the file are written to the screen. Generally, when commands and files have a z in them they have something to do with compression--the letter z stands for zip. You can use zcat | less to view a compressed file proper. You can also use the command zless , which does the same as zcat | less. (Note that your less may actually have the functionality of zless combined.)

A new addition to the arsenal is bzip2. This is a compression program very much like gzip, except that it is slower and compresses 20%-30% better. It is useful for compressing files that will be downloaded from the Internet (to reduce the transfer volume). Files that are compressed with bzip2 have an extension .bz2. Note that the improvement in compression depends very much on the type of data being compressed. Sometimes there will be negligible size reduction at the expense of a huge speed penalty, while occasionally it is well worth it. Files that are frequently compressed and uncompressed should never use bzip2.

4.14 Searching for Files

You can use the command find to search for files. Change to the root directory, and enter find. It will spew out all the files it can see by recursively descending [Goes into each subdirectory and all its subdirectories, and repeats the command find. ] into all subdirectories. In other words, find, when executed from the root directory, prints all the files on the system. find will work for a long time if you enter it as you have--press Ctrl-C to stop it.

Now change back to your home directory and type find again. You will see all your personal files. You can specify a number of options to find to look for specific files.

find -type d
Shows only directories and not the files they contain.
find -type f
Shows only files and not the directories that contain them, even though it will still descend into all directories.
find -name
Finds only files that have the name . For instance, find -name '*.c' will find all files that end in a .c extension ( find -name *.c without the quote characters will not work. You will see why later). find -name Mary_Jones.letter will find the file with the name Mary_Jones.letter.
find -size [[+|-]]
Finds only files that have a size larger (for +) or smaller (for -) than kilobytes, or the same as kilobytes if the sign is not specified.
find [ ...]
Starts find in each of the specified directories.

There are many more options for doing just about any type of search for a file. See find(1) for more details (that is, run man 1 find). Look also at the -exec option which causes find to execute a command for each file it finds, for example:


find /usr -type f -exec ls '-al' '{}' ';'

find has the deficiency of actively reading directories to find files. This process is slow, especially when you start from the root directory. An alternative command is locate . This searches through a previously created database of all the files on the system and hence finds files instantaneously. Its counterpart updatedb updates the database of files used by locate. On some systems, updatedb runs automatically every day at 04h00.

Try these ( updatedb will take several minutes):

updatedb

locate rpm
locate deb
locate passwd
locate HOWTO
locate README


Searching Within Files

Very often you will want to search through a number of files to find a particular word or phrase, for example, when a number of files contain lists of telephone numbers with people's names and addresses. The command grep does a line-by-line search through a file and prints only those lines that contain a word that you have specified. grep has the command summary:


grep [options] [ ...]

[The words word, string, or pattern are used synonymously in this context, basically meaning a short length of letters and-or numbers that you are trying to find matches for. A pattern can also be a string with kinds of wildcards in it that match different characters, as we shall see later.]

Run grep for the word ``the'' to display all lines containing it: grep 'the' Mary_Jones.letter. Now try grep 'the' *.letter.

grep -n
shows the line number in the file where the word was found.
grep -
prints out of the lines that came before and after each of the lines in which the word was found.
grep -A
prints out of the lines that came After each of the lines in which the word was found.
grep -B
prints out of the lines that came Before each of the lines in which the word was found.
grep -v
prints out only those lines that do not contain the word you are searching for. [ You may think that the -v option is no longer doing the same kind of thing that grep is advertised to do: i.e., searching for strings. In fact, UNIX commands often suffer from this--they have such versatility that their functionality often overlaps with that of other commands. One actually never stops learning new and nifty ways of doing things hidden in the dark corners of man pages.]
grep -i
does the same as an ordinary grep but is case insensitive.
Regular Expressions

A regular expression is a sequence of characters that forms a template used to search for strings [Words, phrases, or just about any sequence of characters. ] within text. In other words, it is a search pattern. To get an idea of when you would need to do this, consider the example of having a list of names and telephone numbers. If you want to find a telephone number that contains a 3 in the second place and ends with an 8, regular expressions provide a way of doing that kind of search. Or consider the case where you would like to send an email to fifty people, replacing the word after the ``Dear'' with their own name to make the letter more personal. Regular expressions allow for this type of searching and replacing.

Overview

Many utilities use the regular expression to give them greater power when manipulating text. The grep command is an example. Previously you used the grep command to locate only simple letter sequences in text. Now we will use it to search for regular expressions.

In the previous chapter you learned that the ? character can be used to signify that any character can take its place. This is said to be a wildcard and works with file names. With regular expressions, the wildcard to use is the . character. So, you can use the command grep .3....8 to find the seven-character telephone number that you are looking for in the above example.

Regular expressions are used for line-by-line searches. For instance, if the seven characters were spread over two lines (i.e., they had a line break in the middle), then grep wouldn't find them. In general, a program that uses regular expressions will consider searches one line at a time.

Here are some regular expression examples that will teach you the regular expression basics. We use the grep command to show the use of regular expressions (remember that the -w option matches whole words only). Here the expression itself is enclosed in ' quotes for reasons that are explained later.

grep -w 't[a-i]e'
Matches the words tee, the, and tie. The brackets have a special significance. They mean to match one character that can be anything from a to i.
grep -w 't[i-z]e'
Matches the words tie and toe.
grep -w 'cr[a-m]*t'
Matches the words craft, credit, and cricket. The * means to match any number of the previous character, which in this case is any character from a through m.
grep -w 'kr.*n'
Matches the words kremlin and krypton, because the . matches any character and the * means to match the dot any number of times.
egrep -w '(th|sh).*rt'
Matches the words shirt, short, and thwart. The | means to match either the th or the sh. egrep is just like grep but supports extended regular expressions that allow for the | feature. [ The | character often denotes a logical OR, meaning that either the thing on the left or the right of the | is applicable. This is true of many programming languages. ] Note how the square brackets mean one-of-several-characters and the round brackets with |'s mean one-of-several-words.
grep -w 'thr[aeiou]*t'
Matches the words threat and throat. As you can see, a list of possible characters can be placed inside the square brackets.
grep -w 'thr[^a-f]*t'
Matches the words throughput and thrust. The ^ after the first bracket means to match any character except the characters listed. For example, the word thrift is not matched because it contains an f.

The above regular expressions all match whole words (because of the -w option). If the -w option was not present, they might match parts of words, resulting in a far greater number of matches. Also note that although the * means to match any number of characters, it also will match no characters as well; for example: t[a-i]*e could actually match the letter sequence te, that is, a t and an e with zero characters between them.

Usually, you will use regular expressions to search for whole lines that match, and sometimes you would like to match a line that begins or ends with a certain string. The ^ character specifies the beginning of a line, and the $ character the end of the line. For example, ^The matches all lines that start with a The, and hack$ matches all lines that end with hack, and '^ *The.*hack *$' matches all lines that begin with The and end with hack, even if there is whitespace at the beginning or end of the line.

Because regular expressions use certain characters in a special way (these are . \ [ ] * + ?), these characters cannot be used to match characters. This restriction severely limits you from trying to match, say, file names, which often use the . character. To match a . you can use the sequence \. which forces interpretation as an actual . and not as a wildcard. Hence, the regular expression myfile.txt might match the letter sequence myfileqtxt or myfile.txt, but the regular expression myfile\.txt will match only myfile.txt.

You can specify most special characters by adding a \ character before them, for example, use \[ for an actual [, a \$ for an actual $, a \\ for and actual \, \+ for an actual +, and \? for an actual ?. ( ? and + are explained below.)

The fgrep Command

fgrep is an alternative to grep. The difference is that while grep (the more commonly used command) matches regular expressions, fgrep matches literal strings. In other words you can use fgrep when you would like to search for an ordinary string that is not a regular expression, instead of preceding special characters with \.

5.3 Regular Expression \{ \} Notation

x* matches zero to infinite instances of a character x. You can specify other ranges of numbers of characters to be matched with, for example, x\{3,5\}, which will match at least three but not more than five x's, that is xxx, xxxx, or xxxxx.

x\{4\} can then be used to match 4 x's exactly: no more and no less. x\{7,\} will match seven or more x's--the upper limit is omitted to mean that there is no maximum number of x's.

As in all the examples above, the x can be a range of characters (like [a-k]) just as well as a single charcter.

grep -w 'th[a-t]\{2,3\}t'
Matches the words theft, thirst, threat, thrift, and throat.
grep -w 'th[a-t]\{4,5\}t'
Matches the words theorist, thicket, and thinnest.


Extended Regular Expression + ? \< \> ( ) |

Notation with egrep

An enhanced version of regular expressions allows for a few more useful features. Where these conflict with existing notation, they are only available through the egrep command.

+
is analogous to \{1,\}. It does the same as * but matches one or more characters instead of zero or more characters.
?
is analogous to \{1\}. It matches zero or one character.
\< \>
can surround a string to match only whole words.
( )
can surround several strings, separated by |. This notation will match any of these strings. ( egrep only.)
\( \)
can surround several strings, separated by \|. This notation will match any of these strings. ( grep only.)

The following examples should make the last two notations clearer.

grep 'trot'
Matches the words electrotherapist, betroth, and so on, but
grep '\'
matches only the word trot.
egrep -w '(this|that|c[aeiou]*t)'
Matches the words this, that, cot, coat, cat, and cut.

No comments: