IEEE P1003.2 Draft 11.2 - September 1991 Copyright (c) 1991 by the Institute of Electrical and Electronics Engineers, Inc. 345 East 47th Street New York, NY 10017, USA All rights reserved as an unpublished work. This is an unapproved and unpublished IEEE Standards Draft, subject to change. The publication, distribution, or copying of this draft, as well as all derivative works based on this draft, is expressly prohibited except as set forth below. Permission is hereby granted for IEEE Standards Committee participants to reproduce this document for purposes of IEEE standardization activities only, and subject to the restrictions contained herein. Permission is hereby also granted for member bodies and technical committees of ISO and IEC to reproduce this document for purposes of developing a national position, subject to the restrictions contained herein. Permission is hereby also granted to the preceding entities to make limited copies of this document in an electronic form only for the stated activities. The following restrictions apply to reproducing or transmitting the document in any form: 1) all copies or portions thereof must identify the document's IEEE project number and draft number, and must be accompanied by this entire notice in a prominent location; 2) no portion of this document may be redistributed in any modified or abridged form without the prior approval of the IEEE Standards Department. Other entities seeking permission to reproduce this document, or any portion thereof, for standardization or other activities, must contact the IEEE Standards Department for the appropriate license. Use of information contained in this unapproved draft is at your own risk. IEEE Standards Department Copyright and Permissions 445 Hoes Lane, P.O. Box 1331 Piscataway, NJ 08855-1331, USA +1 (908) 562-3800 +1 (908) 562-1571 [FAX] P1003.2 Draft 11.2 ISO/IEC CD 9945-2.2 STANDARDS PROJECT Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) Part 2: Shell and Utilities Sponsor Technical Committee on Operating Systems and Application Environments of the IEEE Computer Society Work Item Number: JTC 1.22.21.2 Abstract: ISO/IEC 9945-2: 199x (IEEE Std 1003.2-199x) is part of the POSIX series of standards for applications and user interfaces to open systems. It defines the applications interface to a shell command language and a set of utility programs for complex data manipulation. Keywords: API, application portability, data processing, open systems, operating system, portable application, POSIX, shell and utilities P1003.2 / D11.2 September 1991 Copyright (c) 1991 by the Institute of Electrical and Electronics Engineers, Inc. 345 East 47th Street New York, NY 10017, USA All rights reserved. _T_h_i_s _i_s _a_n _u_n_a_p_p_r_o_v_e_d _I_E_E_E _S_t_a_n_d_a_r_d_s _D_r_a_f_t, _s_u_b_j_e_c_t _t_o _c_h_a_n_g_e. _P_e_r_m_i_s_s_i_o_n _i_s _h_e_r_e_b_y _g_r_a_n_t_e_d _f_o_r _I_E_E_E _S_t_a_n_d_a_r_d_s _C_o_m_m_i_t_t_e_e _p_a_r_t_i_c_i_p_a_n_t_s _t_o _r_e_p_r_o_d_u_c_e _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _p_u_r_p_o_s_e_s _o_f _I_E_E_E _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n _a_c_t_i_v_i_t_i_e_s. _P_e_r_m_i_s_s_i_o_n _i_s _a_l_s_o _g_r_a_n_t_e_d _f_o_r _m_e_m_b_e_r _b_o_d_i_e_s _a_n_d _t_e_c_h_n_i_c_a_l _c_o_m_m_i_t_t_e_e_s _o_f _I_S_O _a_n_d _I_E_C _t_o _r_e_p_r_o_d_u_c_e _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _p_u_r_p_o_s_e_s _o_f _d_e_v_e_l_o_p_i_n_g _a _n_a_t_i_o_n_a_l _p_o_s_i_t_i_o_n. _O_t_h_e_r _e_n_t_i_t_i_e_s _s_e_e_k_i_n_g _p_e_r_m_i_s_s_i_o_n _t_o _r_e_p_r_o_d_u_c_e _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n _o_r _o_t_h_e_r _a_c_t_i_v_i_t_i_e_s, _o_r _t_o _r_e_p_r_o_d_u_c_e _p_o_r_t_i_o_n_s _o_f _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _t_h_e_s_e _o_r _o_t_h_e_r _u_s_e_s, _m_u_s_t _c_o_n_t_a_c_t _t_h_e _I_E_E_E _S_t_a_n_d_a_r_d_s _D_e_p_a_r_t_m_e_n_t _f_o_r _t_h_e _a_p_p_r_o_p_r_i_a_t_e _l_i_c_e_n_s_e. _U_s_e _o_f _i_n_f_o_r_m_a_t_i_o_n _c_o_n_t_a_i_n_e_d _i_n _t_h_i_s _u_n_a_p_p_r_o_v_e_d _d_r_a_f_t _i_s _a_t _y_o_u_r _o_w_n _r_i_s_k. IEEE Standards Department Copyright and Permissions 445 Hoes Lane, P.O. Box 1331 Piscataway, NJ 08855-1331, USA +1 (908) 562-3800 +1 (908) 562-1571 [FAX] _S_e_p_t_e_m_b_e_r _1_9_9_1 _S_H _X_X_X_X_X BEGIN_RATIONALE _E_d_i_t_o_r'_s _N_o_t_e_s The IEEE ballot for Draft 11.2 is due at the IEEE Standards Office on 2 _2222_1111 _OOOO_cccc_tttt_oooo_bbbb_eeee_rrrr _1111_9999_9999_1111. You are also asked to e-mail any balloting comments to 2 me: hlj@posix.com. Please read the balloting instructions in Annex G. 2 This document is also registered as ISO/IEC CD 9945-2.2. The 2 international balloting period is unrelated to the IEEE balloting. 2 Member bodies, please consult any accompanying materials from SC22. 2 Also, please read the remainder of these Editor Notes to see explanations 2 of stylistic differences between a draft and the final standard 2 (copyright notices, inline rationale, etc.). 2 The IEEE balloting will be on hiatus during the international balloting 2 period, which is probably scheduled to complete at the May 1992 WG15 2 meeting. This is in accordance with the WG15 Synchronization Plan, which 2 calls for coordinated balloting to result in the approval of an IEEE/ANSI 2 standard that is identical to the ISO/IEC Draft International Standard 2 (DIS). There will be a final recirculation of a full draft (12) to the 2 IEEE balloting group before it is sent to the Standards Board. 2 This section will not appear in the final document. It is used for 2 editorial comments concerning this draft. Draft 11.2 is the fifth 2 recirculation of the balloting process that began in December 1988 with 2 Draft 8. Please consult Annex G and the cover letter for the ballot that accompanied this draft for information on how the recirculation is accomplished. This draft uses small numbers in the right margin in lieu of change bars. 2 ``2'' denotes changes from Draft 11.1 to Draft 11.2. ``1'' denotes 2 changes from Draft 11 to Draft 11.1. All diff-marks prior to Draft 11.1 1 have been removed. Trivial informative (i.e., non-normative) changes and purely editorial changes such as grammar, spelling, or cross references are not diff-marked. There are two versions of Draft 11.2 in circulation. The full printed 2 version was sent for SC22 balloting and is also available from the IEEE 2 for a duplication fee [call (800) 678-IEEE or +1 (908) 981-1393 outside 2 the US]. The version sent to the IEEE balloting group consists (mostly) 1 of pages containing normative changes. This was done to focus balloting 1 group attention on the changes being balloted and to reduce costs and 1 administrative time. The changes-only version contains a few handwritten 1 pointers in the margins to show context where it would not be obvious; 1 numbers near the normal page numbers show what the corresponding Draft 11 1 page number would be. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. The following minor global changes have been made without diff-marks: - Instances of the verbs ``print,'' ``report,'' ``display,'' ``issue,'' and ``list'' are being changed to ``write'' as part of a general cleanup related to the UPE, where ``write'' and ``display'' have precise meanings. This is probably not completed and will continue throughout ballot resolution and the final editing process. ISO and IEEE have tightened up the requirements for the use of ``shall.'' We have been directed that all sentences that are currently declarative must be changed to use the ``shall'' form if they pose a requirement: ``The status is zero'' -> ``The status shall be zero.'' One specific instance of this was changing ``The following options/operands are available'' to ``The following options/operands shall be supported by the implementation.'' Another: ``The foo utility follows the utility argument syntax standard described in 2.11.2'' to ``The foo utility shall conform to the utility argument syntax guidelines described in 2.10.2.'' It is a tedious process to do all these translations and they are not complete. They will completed on a draft-by-draft basis. In the meantime, please assume that all declarative sentences mean to use ``shall'' and treat them as either implementation or application requirements unless they specifically say ``may,'' ``should,'' or ``can.'' The rationale text for all the sections has been temporarily moved from Annex E and interspersed with the appropriate sections. The rationale sections are identified with the phrase ``(_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)'' in the heading. This colocation of rationale with its accompanying text was done to encourage the Technical Reviewers to maintain the rationale text, as well as provide explanations to the reviewers and balloters. Not all of the Rationale sections have contents as of this draft. The empty sections may be partially distracting, but we feel it is imperative to keep them there to encourage the Technical Reviewers to provide rationale as needed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Please report typographical errors to: Hal Jespersen POSIX Software Group 447 Lakeview Way Redwood City, CA 94062 +1 (415) 364-3410 FAX: +1 (415) 364-4498 Email: hlj@Posix.COM (_E_l_e_c_t_r_o_n_i_c _m_a_i_l _i_s _p_r_e_f_e_r_r_e_d.) The copying and distribution of IEEE balloting drafts is accomplished by the Standards Office. To report problems with reproduction of your copy, 2 contact: 2 Anna Kaczmarek 2 IEEE Standards Office P.O. Box 1331 445 Hoes Lane Piscataway, NJ 08855-1331 +1 (908) 562-3811 2 FAX: +1 (908) 562-1571 Additional copies of this draft are available for a duplication and 2 mailing fee. Contact: 2 IEEE Publications 2 1 (800) 678-IEEE 2 +1 (908) 981-1393 [outside US] 2 This draft is available in various electronic forms to assist the review 2 process. Our thanks to Andrew Hume of AT&T Bell Laboratories for 2 providing online access facilities. Note that this is a limited 2 experiment in providing online access; future ballots may provide other 2 forms, such as diskettes or a bulletin board arrangement, but the 2 instructions shown here are the only methods currently available. Please 2 also observe the additional copyright restrictions that are described in 2 the online files. 2 Assuming you have access to the Internet, the scenario is approximately 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. ftp research.att.com # research's IP address is 192.20.225.2 2 2 cd posix/p1003.2/d11.2 2 get toc index 2 binary 2 get p11-20.Z 2 The draft is available in several forms. The table of contents can be 2 found in toc, pages containing a particular section are stored under the 2 section number, sets of pages are stored in files with names of the form 2 p_n-_m, and the entire draft is stored in all. By default, files are 2 ASCII. A .ps suffix indicates PostScript. A .Z suffix indicates a 2 compress'_e_d file. The file index contains a general description of the 2 files available. 2 These files are also available via electronic mail by sending a message 2 like 2 send 3.4 3.5 9.2 from posix/p1003.2/d11.2 2 to netlib@research.att.com. If you use email, you should _n_o_t ask for the 2 compressed version. For a more complete introduction to this form of 2 _n_e_t_l_i_b, send the message 2 send help 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. _P_O_S_I_X._2 _C_h_a_n_g_e _H_i_s_t_o_r_y This section is provided to track major changes between drafts. Since it was first added in Draft 11, earlier entries omit some degree of detail. Draft 11.2 [September 1991] Sixth IEEE ballot (fifth recirculation; 2 only changed pages distributed). Second ISO/IEC CD 9945-2 2 registration (full draft distributed). 2 - Equivalence classes as starting/ending points of 2 regular expression bracket expression range expression 2 have been made unspecified. 2 - The LC_COLLATE substitute keyword has been deleted. 2 - cksum (4.9): Modifications to the algorithm. 2 - cp (4.13): Restoration of the 2 - stty (4.59): Addition of the tostop operand. 2 - lex (A.2): Further clarification of ERE differences. 2 - Miscellaneous clarifications to various utilities. 2 Draft 11.1 [June 1991] Fifth IEEE ballot (fourth recirculation; only 1 changed pages distributed). 1 - Modification of the definition of _b_y_t_e and 1 clarifications of octal/hexadecimal byte 1 representations throughout the utilities. 1 - Clarifications to the locale definition source file 1 description in 2.5; addition of a yacc grammar. 1 - Removal of pax -e character translation option. 1 - Miscellaneous clarifications to various utilities. 1 - Reconciliation of feature test macros and headers in 1 Annex B with POSIX.1. 1 Draft 11 [February 1991] Fourth IEEE ballot (third recirculation). - Changes in 2.3 to the treatment of regular built-ins in regards to their _e_x_e_c-able versions. - Changes to 2.4 (character names and charmap syntax) and 2.5 (localedef input format) as a result of Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. international balloting. Addition of the {POSIX2_LOCALEDEF} symbol. - Changes to the shell quoting rules, arithmetic expression syntax, command search order, error descriptions, and exportable functions. - Movement of the command utility from special built-in status to be a utility in Section 4. - cp (4.13): Significant clarifications and interface changes. - date (4.15): Added field descriptor modifiers to handle alternate calendar forms when supported by the locale and implementation. - pax (4.48): Significant interface changes, including international character set translations. - test (4.62): Deprecated some functionality due to inconsistent behavior in existing implementations that cause portability problems in existing applications. - make (6.2): Addition of the .POSIX special target, return of some rules to strict existing practice. - Miscellaneous clarifications to various utilities. - The FORTRAN section now has two options associated with it: Development Utilities (fort77) and Runtime Utilities (asa). - Addition of full example profiles and charmaps from Denmark in Annex F. Draft 10 [July 1990] Third IEEE ballot (second recirculation). - This draft primarily has been one of clarification and amplification. In resolving ballot objections, large portions of the draft have been rewritten, affecting all sections, but comparatively few changes in [intended] functionality have occurred. - New shell command language features (see Section 3): - Utility name changes: Draft 9 Draft 10 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. _______ ________ create pathchk hexdump od sendto mailx - A few of the utilities and global sections now have a more formal description, using a yacc-like grammar. - Considerably more detail has been added to the internationalization features of the standard: global changes to clauses 2.4 and 2.5; new detail to the LC_* variables in each utility section; specification of LC_MESSAGES (replacing LC_RESPONSE). - Due to some ISO requirements, Sections 1 and 2 have been reorganized yet again, causing many cross reference number changes. The Related Standards annex has been turned into simply a Bibliography. The Non- Specified Language Compilers annex has been replaced by a Sample National Profile annex. Draft 9 [August 1989] Second IEEE ballot (first recirculation). Also registered as ISO/IEC CD 9945-2.1. A few minor corrections to some sections. :-) Draft 8 [December 1988] First IEEE ballot. Also submitted to ISO/IEC JTC 1/SC22 for review and comment. Draft 7 [September 1988] ``Mock ballot'' conducted by working group members only. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. _P_O_S_I_X._2 _T_e_c_h_n_i_c_a_l _R_e_v_i_e_w_e_r_s The individuals denoted in Table i are the Technical Reviewers for this draft. During balloting they are the subject matter experts who coordinate the resolution process for specific sections, as shown. Table i - POSIX.2 Technical Reviewers __________________________________________________________________________________________________________________________________________________ Section Description Reviewer ___________________________________________________________________ 1 _G_e_n_e_r_a_l Jespersen 2.4,2.5 _D_e_f_i_n_i_t_i_o_n_s (_L_o_c_a_l_e_s) Leijonhufvud 1 2 (rest) _D_e_f_i_n_i_t_i_o_n_s (_V_a_r_i_o_u_s) Jespersen 3 _C_o_m_m_a_n_d _L_a_n_g_u_a_g_e Jespersen 4 _E_x_e_c_u_t_i_o_n _E_n_v_i_r_o_n_m_e_n_t _U_t_i_l_i_t_i_e_s: _c_p, rm Bostic 22 4 _E_x_e_c_u_t_i_o_n _E_n_v_i_r_o_n_m_e_n_t _U_t_i_l_i_t_i_e_s: (_t_h_e Jespersen 22 _r_e_s_t) 2 6 _S_o_f_t_w_a_r_e _D_e_v_e_l_o_p_m_e_n_t _U_t_i_l_i_t_i_e_s Jespersen 7 _L_a_n_g_u_a_g_e-_I_n_d_e_p_e_n_d_e_n_t _B_i_n_d_i_n_g_s Jespersen 2 A _C _D_e_v_e_l_o_p_m_e_n_t _U_t_i_l_i_t_i_e_s Jespersen B _C _B_i_n_d_i_n_g_s Jespersen 2 C _F_O_R_T_R_A_N _D_e_v_e_l_o_p_m_e_n_t _a_n_d _R_u_n_t_i_m_e _U_t_i_l_i_t_i_e_s Jespersen D-G _V_a_r_i_o_u_s Jespersen __________________________________________________________________________________________________________________________________________________ Also, our special thanks to Donn Terry for writing or improving all the yacc-based grammars used in Draft 10. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. _P_O_S_I_X._2 _P_r_o_p_o_s_e_d _S_c_h_e_d_u_l_e This section will not appear in the final document. It is used to provide editorial notes regarding the proposed POSIX.2 schedule. In the schedule, the UPE stands for ``User Portability Extension.'' _____________________________________________________________________ | Date | Milestone (End of Meeting) | Draft | _|_______________________|______________________________________|_______| |Sep 7-11, 1987 | Utility format frozen; | 3 | |Nashua, NH | 10% of utilities described. | | _|_______________________|______________________________________|_______| |Dec 7-14, 87 | 50% of utilities described; | 4 | |San Diego, CA | shell update; substantial | | _|_______________________|_p_r_o_g_r_e_s_s__i_n__S_e_c_t_i_o_n_s__2_,__3_,__4_,__8_.______|_______| |Mar 14-18, 1988 | Utility selection frozen; | 5 | |Washington, DC | 75% described. | | _|_______________________|______________________________________|_______| |Jul 11-15, 1988 | 100% utilities described; | 6 | |Denver, CO | functional freeze; produce ``mock | | _|_______________________|_b_a_l_l_o_t_'_'__a_n_d__P_O_S_I_X__F_I_P_S__d_r_a_f_t__7_______|_______| |[Sep-Oct 1988] | [Mock ballot] | 7 | _|_______________________|______________________________________|_______| |Oct 24-28, 1988 | Resolve mock ballot objections; | 7 | |Honolulu, HI | produce first real ballot (draft 8) | | _|_______________________|_U_P_E__p_l_a_n_n_i_n_g__b_e_g_i_n_s___________________|_______| |[Jan-Feb 1989] | [First ballot] | 8 | _|_______________________|______________________________________|_______| |Jan 9-11, 1989 | Begin UPE definitions; | 8 | |Ft. Lauderdale, FL | Technical Reviewer coordination | | _|_______________________|_o_f__f_i_r_s_t__b_a_l_l_o_t__r_e_s_p_o_n_s_e_s_____________|_______| |[Feb-Apr 1989] | [Ballot resolution] | 8 | _|_______________________|______________________________________|_______| |Apr 24-28, 1989 | Working Group concurrence with | 9 | |Minneapolis, MN | ballot resolution; produce Draft 9 | | _|_______________________|_f_o_r__r_e_c_i_r_c_u_l_a_t_i_o_n_;__U_P_E__w_o_r_k___________|_______| |Jul 10-14, 1989 | UPE work | | |San Jose, CA | | | _|_______________________|______________________________________|_______| _|[_O_c_t__1_9_8_9_]______________|_[_F_i_r_s_t__R_e_c_i_r_c_u_l_a_t_i_o_n_]_________________|___9____| |[Nov-Feb 1990] | [Ballot resolution] | 9 | _|_______________________|______________________________________|_______| _|[_A_u_g_-_S_e_p__1_9_9_0_]__________|_[_S_e_c_o_n_d__R_e_c_i_r_c_u_l_a_t_i_o_n_]________________|__1_0____| |[Mar 1991] | [Third Recirculation] | 11 | _|_______________________|______________________________________|_______| _|[_J_u_n__1_9_9_1_]______________|_[_F_o_u_r_t_h__R_e_c_i_r_c_u_l_a_t_i_o_n_]________________|_1_1_._1___| 11 _|_______________________|______________________________________|_______| 11111 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. |[Sep 1991] | [Fifth Recirculation] | 11.2 | 1 _|_______________________|______________________________________|_______| 1 _|[_m_i_d_-_1_9_9_2_]______________|_[_I_E_E_E__S_t_a_n_d_a_r_d__B_o_a_r_d__A_p_p_r_o_v_e_s_?_?_]______|__1_2____| 21 |[Jul 1990 - Apr 1992] | [Ballot .2a UPE supplement] | | 1 _|_______________________|______________________________________|_______| END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. IEEE Standards documents are developed within the Technical Committees of the IEEE Societies and the Standards Coordinating Committees of the IEEE Standards Board. Members of the committees serve voluntarily and without compensation. They are not necessarily members of the Institute. The standards developed within IEEE represent a consensus of the broad expertise on the subject within the Institute as well as those activities outside of IEEE that have expressed an interest in participating in the development of the standard. Use of an IEEE Standard is wholly voluntary. The existence of an IEEE Standard does not imply that there are no other ways to produce, test, measure, purchase, market, or provide other goods and services related to the scope of the IEEE Standard. Furthermore, the viewpoint expressed at the time a standard is approved and issued is subject to change brought about through developments in the state of the art and comments received from users of the standard. Every IEEE Standard is subjected to review at least every five years for revision or reaffirmation. When a document is more than five years old and has not been reaffirmed, it is reasonable to conclude that its contents, although still of some value, do not wholly reflect the present state of the art. Users are cautioned to check to determine that they have the latest edition of any IEEE Standard. Comments for revision of IEEE Standards are welcome from any interested party, regardless of membership affiliation with IEEE. Suggestions for changes in documents should be in the form of a proposed change of text, together with appropriate supporting comments. Interpretations: Occasionally questions may arise regarding the meaning of portions of standards as they relate to specific applications. When the need for interpretations is brought to the attention of the IEEE, the Institute will initiate action to prepare appropriate responses. Since IEEE Standards represent a consensus of all concerned interests, it is important to ensure that any interpretation has also received the concurrence of a balance of interests. For this reason, the IEEE and the members of its technical committees are not able to provide an instant response to interpretation requests except in those cases where the matter has previously received formal consideration. Comments on standards and requests for interpretations should be addressed to: Secretary, IEEE Standards Board 445 Hoes Lane P.O. Box 1331 Piscataway, NJ 08855-1331 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. __________________________________________________________________ |IEEE Standards documents are adopted by the Institute of | |Electrical and Electronics Engineers without regard | |to whether their adoption may involve patents | |on articles, materials, or processes. | |Such adoption does not assume any liability to any patent owner, | |nor does it assume any obligation whatever to parties adopting | _||t_h_e__s_t_a_n_d_a_r_d_s__d_o_c_u_m_e_n_t_s_.__________________________________________|| Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Contents PAGE Introduction....................................................... ii Organization of the Standard.................................... ii Base Documents.................................................. ii Related Standards Activities.................................... ii Section 1: General................................................. 1 1.1 Scope..................................................... 1 1.2 Normative References...................................... 13 1.3 Conformance............................................... 14 Section 2: Terminology and General Requirements.................... 21 2.1 Conventions............................................... 21 2.2 Definitions............................................... 26 2.3 Built-in Utilities........................................ 58 2.4 Character Set............................................. 61 2.5 Locale.................................................... 69 2.6 Environment Variables..................................... 119 2.7 Required Files............................................ 126 2.8 Regular Expression Notation............................... 128 2.9 Dependencies on Other Standards........................... 161 2.10 Utility Conventions....................................... 172 2.11 Utility Description Defaults.............................. 182 2.12 File Format Notation...................................... 198 2.13 Configuration Values...................................... 204 Section 3: Shell Command Language.................................. 215 3.1 Shell Definitions......................................... 217 3.2 Quoting................................................... 220 3.3 Token Recognition......................................... 224 3.4 Reserved Words............................................ 226 3.5 Parameters and Variables.................................. 228 3.6 Word Expansions........................................... 233 3.7 Redirection............................................... 249 3.8 Exit Status and Errors.................................... 255 3.9 Shell Commands............................................ 258 3.10 Shell Grammar............................................. 279 3.11 Signals and Error Handling................................ 288 3.12 Shell Execution Environment............................... 289 3.13 Pattern Matching Notation................................. 291 3.14 Special Built-in Utilities................................ 295 Section 4: Execution Environment Utilities......................... 317 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. ii PAGE 4.1 awk - Pattern scanning and processing language............ 317 4.2 basename - Return nondirectory portion of pathname........ 358 4.3 bc - Arbitrary-precision arithmetic language.............. 362 4.4 cat - Concatenate and print files......................... 383 4.5 cd - Change working directory............................. 388 4.6 chgrp - Change file group ownership....................... 392 4.7 chmod - Change file modes................................. 395 4.8 chown - Change file ownership............................. 405 4.9 cksum - Write file checksums and sizes.................... 409 4.10 cmp - Compare two files................................... 416 4.11 comm - Select or reject lines common to two files......... 420 4.12 command - Execute a simple command........................ 424 4.13 cp - Copy files........................................... 430 4.14 cut - Cut out selected fields of each line of a file...... 440 4.15 date - Write the date and time............................ 445 4.16 dd - Convert and copy a file.............................. 452 4.17 diff - Compare two files.................................. 462 4.18 dirname - Return directory portion of pathname............ 471 4.19 echo - Write arguments to standard output................. 475 4.20 ed - Edit text............................................ 479 4.21 env - Set environment for command invocation.............. 498 4.22 expr - Evaluate arguments as an expression................ 503 4.23 false - Return false value................................ 509 4.24 find - Find files......................................... 511 4.25 fold - Fold lines......................................... 521 4.26 getconf - Get configuration values........................ 526 4.27 getopts - Parse utility options........................... 531 4.28 grep - File pattern searcher.............................. 537 4.29 head - Copy the first part of files....................... 545 4.30 id - Return user identity................................. 549 4.31 join - Relational database operator....................... 554 4.32 kill - Terminate or signal processes...................... 559 4.33 ln - Link files........................................... 566 4.34 locale - Get locale-specific information.................. 570 4.35 localedef - Define locale environment..................... 577 4.36 logger - Log messages..................................... 583 4.37 logname - Return user's login name........................ 586 4.38 lp - Send files to a printer.............................. 589 4.39 ls - List directory contents.............................. 595 4.40 mailx - Process messages.................................. 605 4.41 mkdir - Make directories.................................. 610 4.42 mkfifo - Make FIFO special files.......................... 614 4.43 mv - Move files........................................... 617 4.44 nohup - Invoke a utility immune to hangups................ 623 4.45 od - Dump files in various formats........................ 627 4.46 paste - Merge corresponding or subsequent lines of files..................................................... 637 4.47 pathchk - Check pathnames................................. 642 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. iii PAGE 4.48 pax - Portable archive interchange........................ 648 4.49 pr - Print files.......................................... 665 4.50 printf - Write formatted output........................... 672 4.51 pwd - Return working directory name....................... 679 4.52 read - Read a line from standard input.................... 682 4.53 rm - Remove directory entries............................. 686 4.54 rmdir - Remove directories................................ 692 4.55 sed - Stream editor....................................... 695 4.56 sh - Shell, the standard command language interpreter..... 706 4.57 sleep - Suspend execution for an interval................. 713 4.58 sort - Sort, merge, or sequence check text files.......... 716 4.59 stty - Set the options for a terminal..................... 725 4.60 tail - Copy the last part of a file....................... 736 4.61 tee - Duplicate standard input............................ 742 4.62 test - Evaluate expression................................ 745 4.63 touch - Change file access and modification times......... 756 4.64 tr - Translate characters................................. 762 4.65 true - Return true value.................................. 770 4.66 tty - Return user's terminal name......................... 772 4.67 umask - Get or set the file mode creation mask............ 775 4.68 uname - Return system name................................ 780 4.69 uniq - Report or filter out repeated lines in a file...... 784 4.70 wait - Await process completion........................... 790 4.71 wc - Word, line, and byte count........................... 795 4.72 xargs - Construct argument list(s) and invoke utility..... 799 Section 5: User Portability Utilities Option....................... 807 Section 6: Software Development Utilities Option................... 809 6.1 ar - Create and maintain library archives................. 809 6.2 make - Maintain, update, and regenerate groups of programs.................................................. 818 6.3 strip - Remove unnecessary information from executable files..................................................... 844 Section 7: Language-Independent System Services.................... 847 7.1 Shell Command Interface................................... 848 7.2 Access Environment Variables.............................. 849 7.3 Regular Expression Matching............................... 849 7.4 Pattern Matching.......................................... 850 7.5 Command Option Parsing.................................... 850 7.6 Generate Pathnames Matching a Pattern..................... 850 7.7 Perform Word Expansions................................... 851 7.8 Get POSIX Configurable Variables.......................... 851 7.9 Locale Control............................................ 852 Annex A (normative) C Language Development Utilities Option........ 855 A.1 c89 - Compile Standard C programs......................... 856 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. iv PAGE A.2 lex - Generate programs for lexical tasks................. 867 A.3 yacc - Yet another compiler compiler...................... 884 Annex B (normative) C Language Bindings Option..................... 907 B.1 C Language Definitions.................................... 908 B.1.1 POSIX Symbols...................................... 908 B.1.2 Headers and Function Prototypes.................... 910 B.1.3 Error Numbers...................................... 911 B.2 C Numerical Limits........................................ 911 B.2.1 C Macros for Symbolic Limits....................... 912 B.2.2 Compile-Time Symbolic Constants for Portability Specifications..................................... 913 B.2.3 Execution-Time Symbolic Constants for Portability Specifications..................................... 914 B.2.4 POSIX.1 C Numerical Limits......................... 915 B.3 C Binding for Shell Command Interface..................... 915 B.3.1 C Binding for Execute Command...................... 916 B.3.2 C Binding for Pipe Communications with Programs.... 919 B.4 C Binding for Access Environment Variables................ 925 B.5 C Binding for Regular Expression Matching................. 925 B.6 C Binding for Match Filename or Pathname.................. 934 B.7 C Binding for Command Option Parsing...................... 937 B.8 C Binding for Generate Pathnames Matching a Pattern....... 942 B.9 C Binding for Perform Word Expansions..................... 948 B.10 C Binding for Get POSIX Configurable Variables............ 954 B.11 C Binding for Locale Control.............................. 957 Annex C (normative) FORTRAN Development and Runtime Utilities Options......................................................... 959 C.1 asa - Interpret carriage-control characters............... 960 C.2 fort77 - FORTRAN compiler................................. 964 Annex D (informative) Bibliography................................. 973 Annex E (informative) Rationale and Notes.......................... 977 E.1 General................................................... 977 E.2 Terminology and General Requirements...................... 978 E.3 Shell Command Language.................................... 979 E.4 Execution Environment Utilities........................... 980 E.5 User Portability Utilities Option......................... 993 E.6 Software Development Utilities Option..................... 993 E.7 Language-Independent System Services...................... 994 E.8 C Language Development Utilities Option................... 994 E.9 C Language Bindings Option................................ 995 E.10 FORTRAN Development and Runtime Utilities Options......... 996 Annex F (informative) Sample National Profile...................... 997 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. v PAGE Annex G (informative) Balloting Instructions....................... 1091 Identifier Index................................................... 1105 Alphabetic Topical Index........................................... 1111 FIGURES Figure B-1 - Sample _ssss_yyyy_ssss_tttt_eeee_mmmm() Implementation....................... 922 Figure B-2 - Sample _pppp_cccc_llll_oooo_ssss_eeee() Implementation....................... 926 Figure B-3 - Example Regular Expression Matching.................. 933 Figure B-4 - Argument Processing with _gggg_eeee_tttt_oooo_pppp_tttt().................... 942 TABLES Table 2-1 - Typographical Conventions............................. 22 Table 2-2 - Regular Built-in Utilities............................ 58 Table 2-3 - Character Set and Symbolic Names...................... 62 Table 2-4 - Control Character Set................................. 63 Table 2-5 - LC_CTYPE Category Definition in the POSIX Locale...... 76 Table 2-6 - Valid Character Class Combinations.................... 81 Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale.... 84 Table 2-8 - LC_MONETARY Category Definition in the POSIX Locale... 96 Table 2-9 - LC_NUMERIC Category Definition in the POSIX Locale.... 101 Table 2-10 - LC_TIME Category Definition in the POSIX Locale...... 102 Table 2-11 - LC_MESSAGES Category Definition in the POSIX Locale.. 106 Table 2-12 - BRE Precedence....................................... 136 Table 2-13 - ERE Precedence....................................... 139 Table 2-14 - C Standard Operators and Functions................... 171 Table 2-15 - Escape Sequences..................................... 199 Table 2-16 - Utility Limit Minimum Values......................... 205 Table 2-17 - Symbolic Utility Limits.............................. 206 Table 2-18 - Optional Facility Configuration Values............... 212 Table 4-1 - awk Expressions in Decreasing Precedence.............. 322 Table 4-2 - awk Escape Sequences.................................. 347 Table 4-3 - bc Operators.......................................... 370 Table 4-4 - ASCII to EBCDIC Conversion............................ 459 Table 4-5 - ASCII to IBM EBCDIC Conversion........................ 460 Table 4-6 - dirname Examples...................................... 474 Table 4-7 - expr Expressions...................................... 505 Table 4-8 - od Named Characters................................... 632 Table 4-9 - stty Control Character Names.......................... 730 Table 4-10 - stty Circumflex Control Characters................... 731 Table 7-1 - POSIX.1 Numeric-Valued Configurable Variables......... 853 Table A-1 - lex Table Size Declarations........................... 873 Table A-2 - lex Escape Sequences.................................. 875 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. vi Table A-3 - lex ERE Precedence.................................... 877 Table A-4 - yacc Internal Limits.................................. 903 Table B-1 - POSIX.2 Reserved Header Symbols....................... 911 Table B-2 - _POSIX_C_SOURCE....................................... 911 Table B-3 - C Macros for Symbolic Limits.......................... 914 Table B-4 - C Compile-Time Symbolic Constants..................... 916 Table B-5 - C Execution-Time Symbolic Constants................... 916 Table B-6 - Structure Type _rrrr_eeee_gggg_eeee_xxxx______tttt................................ 928 Table B-7 - Structure Type _rrrr_eeee_gggg_mmmm_aaaa_tttt_cccc_hhhh______tttt............................. 928 Table B-8 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp() _cccc_ffff_llll_aaaa_gggg_ssss Argument............................. 928 Table B-9 - _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() _eeee_ffff_llll_aaaa_gggg_ssss Argument............................. 928 Table B-10 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp(), _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() Return Values................... 932 Table B-11 - _ffff_nnnn_mmmm_aaaa_tttt_cccc_hhhh() _ffff_llll_aaaa_gggg_ssss Argument............................. 937 Table B-12 - Structure Type _gggg_llll_oooo_bbbb______tttt................................ 944 Table B-13 - _gggg_llll_oooo_bbbb() _ffff_llll_aaaa_gggg_ssss Argument................................ 945 Table B-14 - _gggg_llll_oooo_bbbb() Error Return Values........................... 947 Table B-15 - Structure Type _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp______tttt............................. 950 Table B-16 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() _ffff_llll_aaaa_gggg_ssss Argument............................. 951 Table B-17 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() Return Values.............................. 952 Table B-18 - confstr() _nnnn_aaaa_mmmm_eeee Values................................ 955 Table B-19 - C Bindings for Numeric-Valued Configurable Variables........................................................ 958 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. vii Introduction (This Introduction is not a normative part of P1003.2 Information technology -- Portable Operating System Interface (POSIX) -- Part 2: Shell and Utilities, but is included for information only.) The purpose of this standard is to define a standard interface and environment for application programs that require the services of a ``shell'' command language interpreter and a set of common utility programs. It is intended for systems implementors and application software developers, and is complementary to ISO/IEC 9945-1: 1990 {8} (first in a family of ``POSIX'' standards), which specifies operating system interfaces and source code level functions, based on the UNIX1) system documentation. This standard, or ``POSIX.2,'' is based upon documentation and the knowledge of existing programs that assume an interface and architecture similar to that described by POSIX.1. (See 1.1 for a full description of the relationship between the standards.) The majority of this standard describes the functions of utilities that can interface with application programs. The standard also provides high-level language interfaces that the application uses to access these utilities and other useful, related services. These language-independent service interfaces are temporarily described in terms of their C language bindings. The C language assumed is that defined by the C Standard: _A_N_S_I/_X_3._1_5_9-_1_9_8_9 _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e _C _S_t_a_n_d_a_r_d produced by Technical Committee X3J11 of the Accredited Standards Committee X3 -- Information Processing Systems. Organization of the Standard The standard is divided into ten parts: - General, including a statement of scope, normative references, and conformance requirements. (Section 1). - Definitions, general requirements, and the environment available to applications. (Section 2). __________ 1) UNIX is a registered trademark of UNIX System Laboratories in the USA and other countries. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. viii Introduction - The shell command interpreter language. (Section 3). - Descriptions of the utilities in the required ``Execution Environment Utilities.'' (Section 4). - Descriptions of the utilities required for user portability on asynchronous terminals. (Section 5 [to be provided in a future revision]). - Descriptions of the utilities in the optional ``Software Development Utilities.'' (Section 6). - Language-independent interfaces for high-level programming language access to shell and related services. (Section 7). - Descriptions of the utilities in the optional ``C Language Development Utilities.'' (Normative Annex A). - C language bindings to the interfaces in Section 6. (Normative Annex B). - Descriptions of the utilities in the optional ``FORTRAN Development and Runtime Utilities.'' (Normative Annex C). This introduction, the foreword, any footnotes, NOTES accompanying the text, and the _i_n_f_o_r_m_a_t_i_v_e annexes are not considered part of the standard. Annexes D through G are informative. Base Documents Many of the interfaces and utilities of this standard were adapted from materials in machine-readable forms donated by the following organizations: - AT&T: the _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n (_S_V_I_D) {B24},2) Issue 2, Volume 2. Copyright c 1986, AT&T; reprinted with permission. - The X/Open Company, Ltd.: the _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e {B30} {B31}, Issues II and III, Volume 1. Copyright c 1989, X/Open Company, Ltd; reprinted with permission. __________ 2) The number in braces corresponds to those of the references in 1.2 (or the bibliographic entry in Annex D if the number is preceded by the letter B). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. ix - University of California, _T_h_e _U_N_I_X _U_s_e_r'_s _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l {B28}, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version, 1986. Copyright c 1980, 1983, The Regents of the University of California; reprinted with permission.3) Significant reference use was also made of the following books: - Bolsky, Morris I., Korn, David G., _T_h_e _K_o_r_n_S_h_e_l_l _C_o_m_m_a_n_d _a_n_d _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B25}, Prentice Hall, Englewood Cliffs, New Jersey (1988). - Aho, Alfred V., Kernighan, Brian W., Weinberger, Peter J., _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}, Addison-Wesley, Reading, Massachusetts (1988). Many other proposals for functions and utilities were received from the various working group members, who are listed in the Acknowledgements section of this standard. Related Standards Activities Activities to extend this standard to address additional requirements are in progress, and similar efforts can be anticipated in the future. The following areas are under active consideration at this time, or are expected to become active in the near future:4) (1) Language-independent service descriptions of POSIX.1 {8} (2) C, Ada, and FORTRAN Language bindings to (1) (3) Verification testing methods (4) Realtime facilities __________ 3) The IEEE is grateful to AT&T, UniForum, and the Regents of the University of California for permission to use their machine-readable materials. 4) A _S_t_a_n_d_a_r_d_s _S_t_a_t_u_s _R_e_p_o_r_t that lists all current IEEE Computer Society standards projects is available from the IEEE Computer Society, 1730 Massachusetts Avenue NW, Washington, DC 20036-1903; Telephone: +1 202 371-0101; FAX: +1 202 728-9614. Working drafts of POSIX standards under development are also available from this office. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. x Introduction (5) Secure/Trusted System considerations (6) Network interface facilities (7) System Administration (8) Graphical User Interfaces (9) Profiles describing application- or user-specific combinations of Open Systems standards for: supercomputing, multiprocessor, and batch extensions; transaction processing; realtime systems; and multiuser systems based on historical models (10) An overall guide to POSIX-based or related Open Systems standards and profiles Extensions are approved as ``amendments'' or ``revisions'' to this document, following the IEEE and ISO/IEC Procedures. Approved amendments are published separately until the full document is reprinted and such amendments are incorporated in their proper positions. If you have interest in participating in the TCOS working groups addressing these issues, please send your name, address, and phone number to the Secretary, IEEE Standards Board, Institute of Electrical and Electronics Engineers, Inc., P.O. Box 1331, 445 Hoes Lane, Piscataway, NJ 08855-1331, and ask to have this forwarded to the chairperson of the appropriate TCOS working group. If you have interest in participating in this work at the international level, contact your ISO/IEC national body. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Related Standards Activities xi P1003.2 was prepared by the 1003.2 working group, sponsored by the Technical Committee on Operating Systems and Application Environments of the IEEE Computer Society. At the time this standard was approved, the membership of the 1003.2 working group was as follows: Technical Committee on Operating Systems and Application Environments (TCOS) Chair: Jehan-Franc,ois Pa^ris TCOS Standards Subcommittee Chair: Jim Isaak Vice Chairs: Ralph Barker David Dodge Robert Bismuth Hal Jespersen Lorraine Kevra Treasurer: Quin Hahn Secretary: Shane McCarron 1003.2 Working Group Officials Chair: Hal Jespersen Vice Chair: Donald W. Cragun Editors: Hal Jespersen (1986, 1988-1991) Maggie Lee (1987-1988) Secretaries: Helene Armitage (1988-1990) Dave Grindeland (1991) Robert J. Makowski (1987-1988) Technical Reviewers Helene Armitage Ken Faubel Gary Miller Keith Bostic Greger Leijonhufvud Marc Teitelbaum John Caywood Bob Lenk Donn Terry Donald Cragun Mark Levine Teoman Topcubasi David Decot Shane McCarron David Willcox Working Group Helene Armitage Quin Hahn Jim Oldroyd Brian Baird Michael J. Hannah Mark Parenti John R. Barr Marjorie E. Harris John Peace Philippe Bertrand David F. Hinnant Jon Penner Robert Bismuth Leon M. Holmes Gerald Powell Jim Blondeau Ron Holt John Quarterman Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. xii Introduction James C. Bohem Randall Howard Joe Ramus Kathy Bohrer Steven A. James Mike Ressler Keith Bostic Steve Jennings Grover Righter Phyllis Eve Bregman Hal Jespersen Andrew K. Roach Peter Brouwer Ronald S. Karr Marco P. Roodzant F. Lee Brown, Jr. Lorraine C. Kevra Seth Rosenthal Jonathan Brown Martin Kirk Maude Sawyer James A. Capps Brad Kline Norman K. Scherer Bill Carpenter Hiromichi Kogure Glen Seeds Steve Carter David Korn Jim Selkaitis John Caywood Rick Kuhn Karen Sheaffer Bob Claeson Mike Lambert Del Shoemaker Mark Colburn Maggie Lee James Soddy Donald W. Cragun Perry Lee Daniel Steinberg Dave Decot Greger Leijonhufvud Scott A. Sutter Terence S. Dowling Bob Lenk Ravi Tavakley Stephen Dum Mark Levine Marc Teitelbaum Dominic Dunlop Gary Lindgren Donn Terry Mike Edmonds John Lomas Jack Thompson Ron Elliott Craig Lund Teoman Topcubasi Richard W. Elwood Rod MacDonald Eugene Tsuno Hirsaki Eto Dan Magenheimer Geraldine Vitovitch Fran Fadden Robert J. Makowski Carl vonLoewenfeldt Ken Faubel Shane P. McCarron Mike Wallace Martin C. Fong Jim McGinness Alan Weaver Terance Fong John McGrory Larry Wehr Glenn Fowler Stuart McKaig Bruce Weiner Gary A. Gaudet Sunil Mehta N. Ray Wilkes Al Gettier Bill Middlecamp David Willcox Timothy D. Gill Gary W. Miller Neil Winton Gregory Goddard Jim Moe David Woodend Loretta Goudie Yasushi Nakahara Morten With Dave Grindeland Martha Nalebuff Ken Witte John Lawrence Gregg Sonya D. Neufer John Wu Jerry Gross Landon Noll Peggy Younger Douglas A. Gwyn Robin T. O'Neill Hilary Zaloom Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Related Standards Activities xiii The following persons were members of the 1003.2 Balloting Group that approved the standard for submission to the IEEE Standards Board: Derek Kaufman _X/_O_p_e_n _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e Shane McCarron _U_N_I_X _I_n_t_e_r_n_a_t_i_o_n_a_l _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e Peter Collinson _U_S_E_N_I_X _A_s_s_o_c_i_a_t_i_o_n _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e Scott Anderson Carol J. Harkness Jim R. Oldroyd Helene Armitage Craig Harmer Craig Partridge David Athersych Dale Harris Rob Peglar Geoff Baldwin Myron Hecht John C. Penney Jerome E. Banasik Morris J. Herbert Rand S. Phares Steven E. Barber David F. Hinnant P. J. Plauger Robert M. Barned Lee A. Hollaar Gerald Powell David R. Bernstein Ronald Holt Jr. Scott E. Preece Kabekode V. S. Bhat Randall Howard James M. Purtilo Robert Bismuth Jim Isaak J. S. Quarterman Jim Blondran Richard James Wendy Rauch-Hindin Robert Borochoff Hal Jespersen Brad Rhoades Keith Bostic Greg Jones Christopher J. Riddick James P. Bound Michael J. Karels Andrew K. Roach Joseph Boykin Lorraine C. Kevra Arnold Robbins Kevin Brady Alan W. Kiecker R. Hughes Rowlands Phyllis Eve Bregman Jeff Kimmel Robert Sarr A. Winsor Brown M. J. Kirk Norman Schneidewind F. Lee Brown Jr. Kenneth C. Klingman Wolfgang Schwabl Luis-Felipe Cabrera Joshua W. Knight Richard Scott Nicholas A. Camillone David Korn Glen Seeds Andres Caravallo Takahiko Kuki Dan Shia Steven L. Carter Robin B. Lake Roger Shimada John Caywood Mike Lambert Mukesh Singhal Kilnam Chon Doris Lebovits Richard Sniderman Chan F. Chong Maggie Lee Steven Sommars Robert L. Claeson Greger Leijonhufvud Bryan W. Sparks Mark Colburn Robert M. Lenk Richard Stallman Kenneth N. Cole David Lennert Daniel Steinberg Richard Cornelius Mark E. Levine Douglas H. Steves William M. Corwin Kevin Lewis Peter Sugar Mike R. Cossey Kin F. Li Scott A. Sutter William Cox James P. Lonjers Ravi Tavakley Donald W. Cragun Joseph F. P. Luhukay Donn Terry Terence Dowling Paul Lustgarten Gary F. Tom Stephen A. Dum Ron Mabe A. T. Twigger John D. Earls Robert J. Makowski Mark-Rene Uchida Ron Elliott Roger J. Martin L. David Umbaugh Richard W. Elwood Joberto S. B. Martins Michael W. Vannier David Emery Yoshihiro Matsumoto M. B. Wagner Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. xiv Introduction Philip H. Enslow Shane McCarron John W. Walz Ken Faubel Martin J. McGowan III Alan G. Weaver Terence Fong Marshall Kirk McKusick Larry Wehr Ed Frankenberry Robert W. McWhirter Bruce Weiner John A. Gertwagen Doug Michels Brian Weis Al Gettier Gary W. Miller Peter J. Weyman Michel Gien James M. Moe Andrew E. Wheeler Gregory W. Goddard J. W. Moore David Willcox Robert C. Groman Anita Mundkur Jeff Wubik Judy Guist Martha Nalebuff Oren Yuen Gregory Guthrie Fred Noz Jason Zions Michael J. Hannah Alan F. Nugent When the IEEE Standards Board approved this standard on <_d_a_t_e _t_o _b_e _p_r_o_v_i_d_e_d>, it had the following membership: (to be pasted in by IEEE) END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Related Standards Activities xv P1003.2/D11.2 Information technology -- Portable Operating System Interface (POSIX) -- Part 2: Shell and Utilities Section 1: General 1.1 Scope This standard defines a standard source code level interface to command interpretation, or ``shell,'' services and common utility programs for application programs. These services and programs are complementary to those specified by ISO/IEC 9945-1: 1990 {8}, hereinafter referred to as ``POSIX.1 {8}.'' The standard has been designed to be used by both application programmers and system implementors. However, it is intended to be a reference document and not a tutorial on the use of the services, the utilities, or the interrelationships between the utilities. The emphasis of this standard is on the shell and utility functionality required by application programs (including ``shell scripts'') and not on the direct interactive use of the shell command language or the utilities by humans. Portions of this standard comprise optional language bindings to system service interfaces. See, for example, the C Language Bindings Option in Annex B. This standard is intended to describe language interfaces and utilities in sufficient detail so that an application developer can understand the required interfaces without access to the source code of existing implementations on which they may be based. Therefore, it does Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 1 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX not attempt to describe the source programming language or internal design of the utilities; they should be considered ``black boxes'' that exhibit the described functionality. For language interfaces, or functions, this standard has been defined exclusively at the source code level. The objective is that a conforming portable application source program can be translated to execute on a conforming implementation. The standard assumes that the source program may need to be retranslated to produce target code for a new environment prior to execution in that environment. There is no requirement that the base operating system supporting the shell and utilities be one that fully conforms to ISO/IEC 9945-1: 1990 {8}. (The base system could contain a subset of POSIX.1 {8} functionality, enough to support the requirements for this standard, as described in 2.9.1, but that could not claim full conformance to all of POSIX.1 {8}.) Furthermore, there is no requirement that the shell command interpreter or any of the standard utilities be written as POSIX.1 {8} conforming programs, or be written in any particular language. Although not requiring a fully conforming POSIX.1 {8} base, this standard is based upon documentation and the knowledge of existing programs that assume an interface and architecture similar to that described by POSIX.1 {8}. Any questions regarding the definition of terms or the semantics of an underlying concept should be referred to POSIX.1 {8}. BEGIN_RATIONALE 1.1.1 Scope Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This standard is one of a family of related standards. The term POSIX is correctly used to describe this family, and not only its foundation, the operating system interfaces of POSIX.1 {8}. Therefore, POSIX.2 could colloquially be described as the ``POSIX Shell and Tools Standard.'' The interfaces documented for this standard are to and from high-level language application programs and to and from the utilities themselves; the standard does not directly address the interface with users. The ``source code'' interface to the command interpreter is defined in terms of high-level language functions in 7.1.1 or 7.1.2 (such as _s_y_s_t_e_m(), B.3.1, or _p_o_p_e_n(), B.3.2). There are also other function interfaces, such as those for matching regular expressions in 7.3 (_r_e_g_c_o_m_p() in B.5). Many of the utilities in this standard, and the shell itself, also accept their own command languages or complex directives as input data, which is also referred to as source code. This data, an ordered series of characters, may be stored in files, or Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 ``scripts,'' that are portable between systems without true recompilation. However, just as with POSIX.1 {8}, the standard addresses only the issue of source code portability between systems; applications using these calls may have to be recompiled or translated when moving from one system to another. There has been considerable debate concerning the appropriate scope of the work represented by this standard. The following are rational alternatives that have been evaluated: (1) Define the shell and tools as extensions to POSIX.1 {8}. This would require a full conforming POSIX.1 {8} system as a base for the new facilities described here. Vocal proponents for this view have been the members of the POSIX.3 working group, who foresaw difficulties in producing a verification suite standard without having a known operating system base. (2) Decouple the shell and tools entirely from POSIX.1 {8}. This would potentially allow the standard to be implemented on such popular operating systems as MVS/TSO, VM/CMS, MS/DOS, VMS, etc. Those systems would not have to provide every minor detail of the POSIX.1 {8} language interfaces to conform under this model- --only enough to support the shell and tools. (3) Compromise between options 1 and 2. Base the standard on an interface _s_i_m_i_l_a_r to POSIX.1 {8}, but don't require full conformance. A simple example would be a Version 7 UNIX System, which could not conform to POSIX.1 {8} without considerable modification. However, a vendor could support all of the features of this standard without changing its kernel or binary compatibility. Another example would be a system that conformed to all stated POSIX.1 {8} interfaces, but that didn't have a fully conforming C Standard {7} compiler. The difficulty with this option is that it makes the stated goal of the working group a bit fuzzier and increases the amount of analysis required for the features included. The working group selected option 3 as its goal. It chose to retain the full UNIX system-like orientation, but did not wish to arbitrarily deprive legitimate systems that could _a_l_m_o_s_t conform. No useful feature of shells or commonly-used utilities were discarded to accommodate nonconforming base systems; on the other hand, no deliberate obstacles were arbitrarily erected. Furthermore, POSIX.1 {8} is still required for its definitions and architectural concepts, which are purposely not repeated in this standard. One concrete example of how the two standards interrelate is in the usage of POSIX.1 {8} function names in the descriptions of utilities in POSIX.2. There are a number of historical commands that directly mapped Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 3 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX into one of the UNIX system calls. For example: chmod and _c_h_m_o_d(); ln and _l_i_n_k(). The POSIX.2 working group was faced with the problem of having to define all of the complex interactions ``behind the scenes'' for some simple commands. Creating a file, for example, involves many POSIX.1 {8} concepts, including processes, user IDs, multiple group permissions (which are optional), error conditions, etc. Rather than enumerating all of these interactions in many places, the POSIX.2 group chose to employ the POSIX.1 {8} function descriptions, where appropriate. See the chmod utility in 4.7 as an example. The utility description includes the phrase: ... performing actions equivalent to the _c_h_m_o_d() function as defined in the POSIX.1 {8} _c_h_m_o_d() function: This means that the POSIX.2 implementor has to read the POSIX.1 {8} _c_h_m_o_d() description and fully understand all of its functionality, requirements, and side effects, which now don't have to be repeated here. (Admittedly, this makes the POSIX.2 standard a bit more difficult to read, but the working group felt that precision transcended the need for readable or semi-tutorial documents.) The Introduction states that one of the goals of the working group was: ``This interface should be implementable on conforming POSIX.1 {8} systems.'' This implies that the working group has attempted to ensure that no additional functionality or extension is required to implement this standard on the base defined by POSIX.1 {8}. This is not to say that extensions are not allowed, but that they should not be necessary. The goal ``(7) Utilities and standards for the installation of applications" was once interpreted to mean that an elaborate series of tools was required to install and remove applications, based on complex description files and system databases of capabilities. An attempt to provide this was rejected by the balloting group and that type of system is now being evaluated by the POSIX.7 System Administration group. However, the original goal remains in the list, because many of the standard utilities are, in fact, targeted specifically for application installation--make, c89, lex, etc. 1.1.1.1 Existing Practice. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The working group would have been very happy to develop a standard that allowed all historical implementations (i.e., those existing prior to the time of publication) to be fully conforming and all historical applications to be Strictly Conforming POSIX Shell Applications without requiring any changes. Some modifications will be required to reconcile the specific differences between historical implementations; there are many divergent versions of UNIX systems extant and applications have sometimes been written to take advantage of features (or bugs) on specific systems. Therefore, the working group established a set of goals to maximize the value of the standard it eventually produced. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 These goals are enumerated in the following subclauses. They are listed in approximate priority sequence, where the first subclause is the most important portability goal. 1.1.1.1.1 Preserve Historical Applications The most important priority was to ensure that historical applications continued to operate on conforming implementations. This required the selection of many utilities and features from the most prevalent historical implementations. The working group is relying on the following factors: (1) Many inconsistent historical features will still be supported as _o_b_s_o_l_e_s_c_e_n_t. (2) Common features of System V and BSD will continue to be supported by their sponsors, even if they aren't included here (just as long as they are not prevented from existing). Therefore, the standard was written so that the large majority of well- written historical applications should continue to operate as Conforming POSIX Shell Applications Using Extensions. 1.1.1.1.2 Clean Up the Interfaces The working group chose to extend the benefits of historical UNIX systems by making limited improvements to the utility interfaces; numerous complaints have been heard over the years about the inconsistencies in the command line interface, which have allegedly made it harder for novice users. Given the constraints of Preserve Historical Applications, the working group has made the following general modifications: (1) Utilities have been extended to deal with differences in character sets, collating sequences, and some cultural aspects relating to the locale of the user. (Examples: new features in regular expressions; new formatting options in date; see 4.15.) (2) The utility syntax guidelines in 2.10.2 have been applied to almost all of the utilities to promote a consistent interface. The guidelines themselves have been loosened up a bit from their counterparts in the _S_V_I_D. In many cases historical utilities have not conformed with these guidelines (which were written considerably later than the utilities themselves). The older interfaces have been maintained in the standard as obsolescent features. (Examples: join, sort.) However, in some cases, such as dd and find, such major surgery was required that the working group decided to leave the historical interfaces as is. ``Fixing'' the interface would mean replacing the command, which would not help applications portability. So, fixing was limited Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 5 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX to relatively minor abuses of the new guidelines, where reasonable consistency could be achieved while still maintaining the general type of interface of the historical version. (3) Features that were not generally portable across machine architectures or systems have been removed or marked obsolescent and new, more portable interfaces have been introduced. (Examples: the octal number methods of describing file modes in chmod and other utilities have been marked obsolescent; the symbolic ``ugo'' method has been extended to other utilities, such as umask.) (4) Features that have proved to be popular in some specific UNIX system variants have been adopted. (Examples: diff -c, which originated in BSD systems, and the ``new'' awk, from System V.) Such features were selected given the requirements for balloting group consensus; the features had to be used widely enough to balance accusations of ``creeping featurism'' and violations of the UNIX system ``tools philosophy.'' (5) Unreasonable inconsistencies between otherwise similar interfaces have been reconciled. (Example: methods of specifying the patterns to the three grep-_r_e_l_a_t_e_d utilities have been made more consistent in the standard's single grep.) (6) When irreconcilable differences arose between versions of historical utilities, new interfaces (utility names or syntax) were sometimes added in their places. The working group resisted the urge to deviate significantly from historical practice; the new interfaces are generally consistent with the philosophy of historical systems and represent comparable functionality to the interfaces being replaced. In some cases, System V and BSD had diverged (such as with echo and sum) so significantly that no compromises for a common interface were possible. In these cases, either the divergent features were omitted or an entirely new command name was selected (such as with printf and cksum). (7) Arbitrary limits to utility operations have been removed. (Example: some historical ed utilities have very limited capabilities for dealing with large files or long input lines.) (8) Arbitrary limitations on historical extensions have been eliminated. (Example: regular expressions have been described so that the popular \< ... \> extension is allowed.) (9) Input and output formats have been specified in more detail than historical implementations have required, allowing applications to more effectively operate in pipelines with these utilities. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 (Example: comm.) Thus, in many cases the working group could be accused of ``violating Existing Practice,'' and in fact received some balloting objections to that effect from implementors (although rarely from users or application developers). The working group was sensitive to charges that it was engaged in arbitrary software engineering rather than merely codifying existing practice. When changes were made, they were always written to preserve historical applications, but to move new conforming applications into a more consistent, portable environment. This strategy obviously requires changes to historical implementations; the working group carefully evaluated each change, weighing the value to users against the one-time costs of adding the new interfaces (and of possibly breaking applications that took advantage of bugs), generally siding with the users when the costs to implementations and applications was not excessively high. In some cases, changes were reluctantly made that could conceivably break some historical applications; the working group allowed these only in the face of practices it considered rare or significantly misguided. 1.1.1.1.3 Allow Historical Conforming Applications It is likely that many historical shell scripts will be Strictly Conforming POSIX.2 Applications without requiring modifications. Developers have long been aware of the differences among the historical UNIX system variants and have avoided the nonportable aspects to increase the scope of their applications' marketplace. However, the previous goal of a consistent interface was considered to be quite important, so there will be modifications required to some applications if they wish to be maximally portable in the future. 1.1.1.1.4 Preserve Historical Implementations As explained in 1.1.1.1.2, the requirements for portability and a consistent interface have caused the working group to add new utilities and features. No historical implementations contained all of the attributes required by the working group. Therefore, this lowest priority goal fell victim to the preceding goals, and every known historical implementation will require some modifications to conform to this standard. The working group took care to ensure that the implementations could add the new or modified features without breaking the operation of existing applications. (Note that the standard utilities are not considered applications in this regard, but are part of the implementation. In fact, many or most of the utilities named by this standard will have to change to some extent.) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 7 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 1.1.1.2 Outside the Scope. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The following areas are outside the scope of this standard. This subclause explains more of the rationale behind the exclusions. (It should be noted that this is not an official list. It was not part of the Project Authorization Request submitted to the IEEE, but was devised as a guide to keep the working group discussions on track.) (1) _O_p_e_r_a_t_i_n_g _s_y_s_t_e_m _a_d_m_i_n_i_s_t_r_a_t_i_v_e _c_o_m_m_a_n_d_s (_p_r_i_v_i_l_e_g_e_d _p_r_o_c_e_s_s_e_s, _s_y_s_t_e_m _p_r_o_c_e_s_s_e_s, _d_a_e_m_o_n_s, _e_t_c.). The working group followed the lead of the POSIX.1 {8} group in this instance. Administrative commands were felt to be too implementation dependent and not useful for application portability. Subsequent to this decision, a separate POSIX.7 working group was formed to deal with this area of ``operator portability.'' It is anticipated that utilities needed for system administration will be closely coordinated with the POSIX.2 working group. (2) _C_o_m_m_a_n_d_s _r_e_q_u_i_r_e_d _f_o_r _t_h_e _i_n_s_t_a_l_l_a_t_i_o_n, _c_o_n_f_i_g_u_r_a_t_i_o_n, _o_r _m_a_i_n_t_e_n_a_n_c_e _o_f _o_p_e_r_a_t_i_n_g _s_y_s_t_e_m_s _o_r _f_i_l_e _s_y_s_t_e_m_s. This area is similar to item (1). System installation is contrasted against the application installation portion of the Scope by its orientation to installing the operating system itself, versus application programs. The exclusion of operating system installation facilities should not be interpreted to mean that the application installation procedures _c_a_n_n_o_t be used for installing operating system components. The proposed interface for this area encountered stiff resistance from the balloting group in Draft 8 and was temporarily withdrawn. As described in Annex E.4, a decision of the balloting group is pending on whether to begin work on a supplement to this standard (POSIX.2b) for application installation. (3) _N_e_t_w_o_r_k_i_n_g _c_o_m_m_a_n_d_s. These were excluded because they are deeply involved with other standards making bodies and are probably too complicated. In this case, several working groups were formed within the POSIX family to deal with this. It is anticipated that utilities needed for networking, if any, will be closely coordinated with the POSIX.2 working group. (In early drafts of this standard, which predated the formation of the networking-specific POSIX working groups, the historical ``UNIX system to UNIX system copy [UUCP]'' programs and protocols were included. These descriptions have been removed in deference to a more appropriate working group.) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 8 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 (4) _T_e_r_m_i_n_a_l _c_o_n_t_r_o_l _o_r _u_s_e_r-_i_n_t_e_r_f_a_c_e _p_r_o_g_r_a_m_s (_e._g., _v_i_s_u_a_l _s_h_e_l_l_s, _v_i_s_u_a_l _e_d_i_t_o_r_s, _w_i_n_d_o_w _m_a_n_a_g_e_r_s, _c_o_m_m_a_n_d _h_i_s_t_o_r_y _m_e_c_h_a_n_i_s_m_s, _e_t_c.). This is probably the most contentious exclusion. A common complaint about many UNIX systems is how they're not very ``user friendly.'' Some people have hoped that the interface to users could be standardized with mice, icon-based desktop metaphors, and so forth. This standard neatly sidesteps those concerns by reminding its audience that it is an application portability standard, and therefore has little relationship to the manner in which users manage their terminals. However, this guideline was not meant to apply to applications. It is perfectly reasonable for an application to assume it can have a user interacting with it. That is why such facilities as 1 displaying strings (with printf) without _s, stty, and 1 various prompting utilities are included in the standard. The interfaces in this standard are very oriented to command lines being issued by shell scripts, or through the _s_y_s_t_e_m() or _p_o_p_e_n() functions. Therefore, interactive text editors, pagers, and other user interface tools have been omitted for now. Alternatively, other standards bodies, such as X3H3.6 and the IEEE TCOS P1201 working group, are devising interfaces that could possibly be more useful and long-lived than any prescribed by POSIX.2. There is one area of this subject that will be addressed by POSIX.2. The scope of the working group has been expanded to include what is being termed the _U_s_e_r _P_o_r_t_a_b_i_l_i_t_y _E_x_t_e_n_s_i_o_n, POSIX.2a. This will be published as a supplement to this standard and have the goal of providing a portable environment for relatively expert time-sharing or software development users. It will not attempt to deal with mice or windows or other advanced interfaces at this time, but should cover many of the terminal-oriented utilities, such as a full-screen editor, currently avoided by this edition of POSIX.2. (5) _G_r_a_p_h_i_c_s _p_r_o_g_r_a_m_s _o_r _i_n_t_e_r_f_a_c_e_s. See the comments on user interface, above. (6) _T_e_x_t _f_o_r_m_a_t_t_i_n_g _p_r_o_g_r_a_m_s _o_r _l_a_n_g_u_a_g_e_s. The existing text formatting languages are generally too primitive in scope to satisfy many users, who have relied on a myriad of macro languages. There is an ISO standard text description language, SGML, but this has had insufficient Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 9 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX exposure to the UNIX system community for standardization as part of POSIX at this time. (7) _D_a_t_a_b_a_s_e _p_r_o_g_r_a_m_s _o_r _i_n_t_e_r_f_a_c_e_s (_e._g. _S_Q_L, _e_t_c.). These interfaces are the province of other standards bodies. 1.1.1.3 Language-Independent Descriptions. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The POSIX.1 {8} and POSIX.5 working groups are currently engaged in developing the model for language-independent descriptions of system services. When complete, it will allow the C language bias of the POSIX.1 {8} standard to be excised and C will take its place among other language bindings that interface with the core services descriptions. The POSIX.2 working group did not wish to duplicate effort, and has therefore waited until POSIX.1 {8} achieves progress in this area. Thus, like the first version of POSIX.1 {8}, the initial drafts of POSIX.2 start life as a C-only standard, with language independence scheduled to be included in a later draft. Fortunately, this standard is substantially less involved with C than POSIX.1 {8} is. In fact, all of the C interfaces are entirely optional. 1.1.1.4 Base Documents. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The working group consulted a number of documents in the course of its deliberations, to select utilities and features. There were five primary documents that started off the process: (1) The _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n (_S_V_I_D), Issue 2, Volume 2. (2) The _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e, (_X_P_G), Issues II and III, Volume 1. (3) _T_h_e _U_N_I_X _U_s_e_r'_s _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version. (The printed documentation as well as the online versions provided with the BSD ``Tahoe'' and ``Reno'' distributions were considered as one base document for the POSIX.2 work.) (4) _T_h_e _K_o_r_n_S_h_e_l_l _C_o_m_m_a_n_d _a_n_d _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by Bolsky and Korn. (5) _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by Aho, Kernighan, and Weinberger. The _X_P_G was used most heavily in initial deliberations about which utilities and features to include. The X/Open companies had done a very thorough job in analyzing the _S_V_I_D and other standards to compile a list Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 10 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 of the most useful and portable utilities. They carefully marked many features that had portability problems and the working group avoided them for this standard. AT&T, X/Open, and Berkeley provided machine-readable documentation for the use of the working group. However, due to very substantial differences in formatting standards, there is little resemblance between some of the utilities described here and their cousins in the _S_V_I_D, _X_P_G, and BSD user manual. Nevertheless, early usage of these documents was an invaluable aid in the production of the standard and the POSIX.2 working group extends its sincere thanks to all three organizations for their generous cooperation. The biggest divergence in POSIX.2's documentation has been its philosophy of fully specifying interfaces. The _S_V_I_D and _X_P_G are oriented solely towards application portability. Implementors would have a difficult time writing some of these utilities from the descriptions alone. In fact, both documents freely rely on the potential implementors licensing the source code for the reference systems to complete the specification. The POSIX.2 standard, on the other hand, also has implementors in its audience and it strove to expand its descriptions wherever useful and feasible. For example, it makes use of BNF grammars to describe complex syntaxes. It attempts to describe the interactions between options, operands, and environment variables, where conflicts can exist. It also attempts to describe all of the useful utility input and output formats. The goal here was to allow application developers to write filters or other programs that could parse the output of any of these utilities or to provide meaningful input from their programs. To the working group's knowledge, this is a task never before attempted for the historical UNIX system commands-the source code was always so readily available to anyone who really needed to know this information. The two commercial books listed were used as reference materials in preparing information on the shell and the _a_w_k language that was more recent and complete than AT&T's or X/Open's documentation. 1.1.1.5 History. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The _1_9_8_4 /_u_s_r/_g_r_o_u_p _S_t_a_n_d_a_r_d was originally intended to include the shell and user level commands. However, the /usr/group (now known as ``UniForum'') Standards Committee was unable to begin this effort, due to the complexity of the system call and library functions that it eventually did publish. A shell was referred to in the _s_y_s_t_e_m() function defined by _A_N_S_I/_X_3._1_5_9- _1_9_8_9 _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e _C _S_t_a_n_d_a_r_d, but no syntax for the shell command language was attempted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.1 Scope 11 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX As the first version of POSIX.1 {8} neared completion, it became apparent that the usefulness of POSIX would be diminished if no shell or utilities were defined. Therefore, the POSIX.2 working group was formed in January 1986 at the Denver, Colorado, meeting of POSIX.1 {8} to address this concern. The progress of the working group has seemed rather slow during the more than three years of its existence. This is primarily because its membership had substantial overlap with the POSIX.1 {8} working group; for example, the Chair of POSIX.2 was also the Technical Editor of POSIX.1 {8} (and POSIX.2 as well!) at the time. And, meetings were arbitrarily shortened to allow the POSIX.1 {8} group to move forward as quickly as possible. 1.1.1.6 Internationalization. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Some of the utilities and concepts described in this standard contain requirements that standardize multilingual and multicultural support. Most of the internationalized support for this standard was proposed by the UniForum Technical Committee Subcommittee on Internationalization, at the request of the POSIX.2 working group. UniForum, a nonprofit organization, organizes subcommittees of Technical Committees to do standards research on different topics pertinent to POSIX. The UniForum Subcommittee on Internationalization is one such group. It was formed to propose and promote standard internationalized extensions to POSIX-based systems. The POSIX.2 working group and the UniForum Subcommittee on Internationalization coordinated their work by the use of liaison members, who attended the meetings of both groups. The interaction between the two groups started when POSIX.2 asked the Subcommittee on Internationalization to provide internationalized support for regular expressions. Later, the Subcommittee on Internationalization was charged with identifying areas in the standard needing changes for internationalized support and proposing those changes. 1.1.1.7 Test Methods. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The POSIX.3 working group has worked on a test methods specification for verifying conformance to POSIX standards in general and POSIX.1 {8} and POSIX.2 in particular. Test methods for POSIX.2 should be published as a separate document1) sometime after POSIX.2 is approved. __________ 1) See the Foreword for information on the activities of other POSIX working groups. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 12 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 1.1.1.8 Organization of the Standard. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The standard document is organized into sections. Some of these, such as the Scope in 1.1, are mandated by ISO/IEC, the IEEE, and other standards bodies. The remainder of the document is organized into small sections for the convenience of the working group and others. It has been suggested that all of the utility descriptions (and maybe the functions, too) should be lumped into one large section, all in alphabetical order. This would presumably make it easier for some users to use the document as a reference document. The working group deliberately chose to not organize it in this way, for the following reasons: (1) Certain sections are optional. It is more convenient for the document's internal references, and also for people specifying systems, if these optional sections are in large pieces, rather than a detailed list of utility names. (2) Future supplements to this standard will be adding new utilities that will also be optional. It would be confusing to try to merge documents at a level below major sections (chapters). END_RATIONALE 1.2 Normative References The following standards contain provisions which, through references in this text, constitute provisions of this standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this part of this International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards. {1} ISO/IEC 646: 1983,2) _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_I_S_O _7-_b_i_t _c_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t _f_o_r _i_n_f_o_r_m_a_t_i_o_n _i_n_t_e_r_c_h_a_n_g_e. __________ 2) Under revision. (This notation is meant to explicitly reference the 1990 Draft International Standard version of ISO/IEC 646.) ISO/IEC documents can be obtained from the ISO office, 1, rue de Varembe', Case Postale 56, CH-1211, Gene`ve 20, Switzerland/Suisse. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.2 Normative References 13 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX {2} ISO 1539: 1980, _P_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e_s--_F_O_R_T_R_A_N. {3} ISO 4217: 1987, _C_o_d_e_s _f_o_r _t_h_e _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _c_u_r_r_e_n_c_i_e_s _a_n_d _f_u_n_d_s. {4} ISO 4873: 1986, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_I_S_O _8-_b_i_t _c_o_d_e _f_o_r _i_n_f_o_r_m_a_t_i_o_n _i_n_t_e_r_c_h_a_n_g_e--_S_t_r_u_c_t_u_r_e _a_n_d _r_u_l_e _f_o_r _i_m_p_l_e_m_e_n_t_a_t_i_o_n. {5} ISO 8859-1: 1987, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_8-_b_i_t _s_i_n_g_l_e-_b_y_t_e _c_o_d_e_d _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s--_P_a_r_t _1: _L_a_t_i_n _a_l_p_h_a_b_e_t _N_o. _1. {6} ISO 8859-2: 1987, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_8-_b_i_t _s_i_n_g_l_e-_b_y_t_e _c_o_d_e_d _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s--_P_a_r_t _2: _L_a_t_i_n _a_l_p_h_a_b_e_t _N_o. _2. {7} ISO/IEC 9899: 1990, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g _s_y_s_t_e_m_s--_P_r_o_g_r_a_m_m_i_n_g 1 _l_a_n_g_u_a_g_e_s--_C. {8} ISO/IEC 9945-1: 1990, _I_n_f_o_r_m_a_t_i_o_n _t_e_c_h_n_o_l_o_g_y--_P_o_r_t_a_b_l_e _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _I_n_t_e_r_f_a_c_e (_P_O_S_I_X)--_P_a_r_t _1: _S_y_s_t_e_m _A_p_p_l_i_c_a_t_i_o_n _P_r_o_g_r_a_m _I_n_t_e_r_f_a_c_e (_A_P_I) [_C _L_a_n_g_u_a_g_e] 1.3 Conformance 1.3.1 Implementation Conformance 1.3.1.1 Requirements A _c_o_n_f_o_r_m_i_n_g _i_m_p_l_e_m_e_n_t_a_t_i_o_n shall meet all of the following criteria: (1) The system shall support all required interfaces defined within this standard. These interfaces shall support the functional behavior described herein. The system shall provide the shell command language described in Section 3 and the utilities in Section 4. (2) The system may provide one or more of the following: the Software Development Utilities Option, the C Language Bindings Option, the C Language Development Utilities Option, the FORTRAN Development Utilities Option, or the FORTRAN Runtime Utilities Option. When an implementation claims that an optional facility is provided, all of its constituent parts shall be provided. (3) The system may provide additional or enhanced utilities, functions, or facilities not required by this standard. Nonstandard extensions should be identified as such in the system documentation. Nonstandard extensions, when used, may Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 14 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 change the behavior of utilities, functions, or facilities defined by this standard. In such cases, the implementation's conformance document (see 2.2.1.2) shall define an execution environment (i.e., shall provide general operating instructions) in which an application can be run with the behavior specified by the standard. In no case shall such an environment require modification of a Strictly Conforming POSIX.2 Application. 1.3.1.2 Documentation A conformance document with the following information shall be available for an implementation claiming conformance to this standard. The conformance document shall have the same structure as this standard, with the information presented in the appropriately numbered sections; sections that consist solely of subordinate section titles, with no other information, are not required. The conformance document shall not contain information about extended facilities or capabilities outside the scope of this standard, unless those extensions affect the behavior of a Strictly Conforming POSIX.2 Application; in such cases, the documentation required by the previous subclause shall be included. The conformance document shall contain a statement that indicates the full name, number, and date of the standard that applies. The conformance document may also list software standards approved by ISO/IEC or any ISO/IEC member body that are available for use by a Conforming POSIX.2 Application. It should indicate whether it is based on a fully- conformant POSIX.1 {8} system. Applicable characteristics where documentation is required by one of these standards, or by standards of government bodies, may also be included. The conformance document shall describe the symbolic values found in 2.13.2, stating values, the conditions under which those values can change, and the limits of such variations, if any. The conformance document shall describe the behavior of the implementation for all implementation-defined features defined in this standard. This requirement shall be met by listing these features and providing either a specific reference to the system documentation or providing full syntax and semantics of these features. When the value or behavior in the implementation is designed to be variable or customizable on each instantiation of the system, the implementation provider shall document the nature and permissible ranges of this variation. When information required by this standard is related to the underlying operating system and is already available in the POSIX.1 {8} conformance document, the implementation need not duplicate this information in the POSIX.2 conformance document, but may provide a cross-reference for this purpose. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.3 Conformance 15 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The conformance document may specify the behavior of the implementation for those features where this standard states that implementations may vary or where features are identified as undefined or unspecified. No specifications other than those described in this subclause (1.3.1.2) shall be present in the conformance document. The phrase ``shall be documented'' in this standard means that documentation of the feature shall appear in the conformance document, as described previously, unless the system documentation is explicitly mentioned. The system documentation should also contain the information found in the conformance document. 1.3.1.3 Conforming Implementation Options The following symbolic constants, described in 2.13.2 reflect implementation options for this standard that could warrant requirement by Conforming POSIX.2 Applications, or in specifications of conforming systems, or both: {POSIX2_SW_DEV} The system supports the Software Development Utilities Option in Section 6. {POSIX2_C_BIND} The system supports the C Language Bindings Option in Annex B. {POSIX2_C_DEV} The system supports the C Language Development Utilities Option in Annex A. {POSIX2_FORT_DEV} The system supports the FORTRAN Development Utilities Option in Annex C. {POSIX2_FORT_RUN} The system supports the FORTRAN Runtime Utilities Option in Annex C. {POSIX2_LOCALEDEF} The system supports the creation of locales as described in 4.35. Additional language bindings and development utility options may be provided in other related standards or in future revisions to this standard. In the former case, additional symbolic constants of the same general form as shown in this subclause should be defined by the related standard document and made available to the application, without requiring this POSIX.2 document to be updated. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 16 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 1.3.2 Application Conformance All applications claiming conformance to this standard fall within one of the following categories: 1.3.2.1 Strictly Conforming POSIX.2 Application A Strictly Conforming POSIX.2 Application is an application that requires only the facilities described in this standard (including any required facilities of the underlying operating system; see 2.9.1). Such an application: (1) shall accept any implementation behavior that results from actions it takes in areas described in this standard as _i_m_p_l_e_m_e_n_t_a_t_i_o_n-_d_e_f_i_n_e_d or _u_n_s_p_e_c_i_f_i_e_d, or where the standard indicates that implementations may vary; (2) shall not perform any actions that are described as producing _u_n_d_e_f_i_n_e_d results; (3) for symbolic constants, shall accept any value in the range permitted by this standard, but shall not rely on any value in the range being greater than the minimums listed in this standard; (4) shall not use facilities designated as _o_b_s_o_l_e_s_c_e_n_t; (5) is required to tolerate, and is permitted to adapt to, the 1 presence or absence of optional facilities whose availability is 1 indicated by the constants in 2.13.1, or that are described 1 using the verb _m_a_y. However, an application requiring a high- 1 level language binding option can only be considered at best a Conforming POSIX.2 Application; see 1.3.2.2. Within this standard, any restrictions placed upon a Conforming POSIX.2 Application shall also restrict a Strictly Conforming POSIX.2 Application. 1.3.2.2 Conforming POSIX.2 Application The term Conforming POSIX.2 Application is used to describe either of the two following application types. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.3 Conformance 17 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 1.3.2.2.1 ISO/IEC Conforming POSIX.2 Application An ISO/IEC Conforming POSIX.2 Application is an application that uses only the facilities described in this standard (including the implied facilities of the underlying operating system; see 2.9.1) and approved conforming language bindings for any ISO/IEC standard. Such an application shall include a statement of conformance that documents all options and limit dependencies, and all other ISO/IEC standards used. 1.3.2.2.2 Conforming POSIX.2 Application A Conforming POSIX.2 Application differs from an ISO/IEC Conforming POSIX.2 Application in that it also may use specific standards of a single ISO/IEC member body referred to here as ``<_N_a_t_i_o_n_a_l _B_o_d_y>.'' Such an application shall include a statement of conformance that documents all options and limit dependencies, and all other <_N_a_t_i_o_n_a_l _B_o_d_y> standards used. 1.3.2.3 Conforming POSIX.2 Application Using Extensions A Conforming POSIX.2 Application Using Extensions is an application that differs from a Conforming POSIX.2 Application only in that it uses nonstandard facilities that are consistent with this standard. Such an application shall fully document its requirements for these extended facilities, in addition to the documentation required of a Conforming POSIX.2 Application. A Conforming POSIX.2 Application Using Extensions shall be either an ISO/IEC Conforming POSIX.2 Application Using Extensions or a Conforming POSIX.2 Application Using Extensions (see 1.3.2.2.1 and 1.3.2.2.2). BEGIN_RATIONALE 1.3.3 Conformance Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) These conformance definitions are closely related to those in POSIX.1 {8}. The terms _C_o_n_f_o_r_m_i_n_g _P_O_S_I_X._2 _A_p_p_l_i_c_a_t_i_o_n and its variants were selected to parallel the terms used in POSIX.1 {8}. The descriptions of the ISO/IEC and Conforming POSIX.2 Applications are similar to the same descriptions in POSIX.1 {8}. This is not a duplication of effort, as this standard relies on only a portion of POSIX.1 {8}, as explained in 1.1 and 2.9.1. Therefore conformance to POSIX.2 has to be described separately from any conformance options or requirements in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 18 1 General Part 2: SHELL AND UTILITIES P1003.2/D11.2 A reference to a Language-Independent System Services Option was removed from the list of optional features that may be provided by the conforming implementation. There is no conformance value provided by that section, except as a reference point for functions actually provided by a real language binding. Therefore, the language binding sections are the ones that remain in the optional list. The Draft 8 section Language-Dependent Services for the C Programming Language was removed, as this subject is adequately, and appropriately, covered in Annex A. The documentation requirement for implementation extensions (``shall define an execution environment'') is simply meant to require that system-wide or per-user configuration options or environment variables that affect the operation of applications that use the standard utilities and functions be described in the conformance document. For example, if setting the (imaginary) LC_TRUTH variable causes changes in the exit status of true, the conformance document must describe this condition and how to avoid it--say, by unsetting the variable in the login script. For further rationale on the types of conformance, see the POSIX.1 {8} Rationale. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1.3 Conformance 19 P1003.2/D11.2 Section 2: Terminology and General Requirements 2.1 Conventions 2.1.1 Editorial Conventions This standard uses the following editorial and typographical conventions. A summary of typographical conventions is shown in Table 2-1. The Bold Courier font is used to show brackets that denote optional arguments in a utility synopsis, as in cut [-_c _l_i_s_t] [_f_i_l_e__n_a_m_e] These brackets shall not be used by the application unless they are specifically mentioned as literal input characters by the utility description. There are two types of symbols enclosed in angle brackets (< >): C-Language Headers The header name is in the Courier font, such as . When coding C programs, the brackets are used as required by the language. Parameters Parameters, also called _m_e_t_a_v_a_r_i_a_b_l_e_s, are in italics, such as <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e>. The entire symbol, including the brackets, is meant to be replaced by the value of the symbol described within the brackets. Numbers within braces, such as ``POSIX.1 {8},'' represent cross references to the Normative References clause (see 1.2). If the number is preceded by a B, it represents a Bibliographic entry (see Annex D). Bibliographic entries are for information only. In some examples, the Bold Courier font is used to indicate the system's output that resulted from some user input, shown in Courier. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.1 Conventions 21 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 2-1 - Typographical Conventions __________________________________________________________________________________________________________________________________________________ Reference Example ___________________________________________________________________ C-Language Data Type _l_o_n_g C-Language Function _s_y_s_t_e_m() C-Language Function Argument _a_r_g_1 C-Language Global External _e_r_r_n_o C-Language Header C-Language Keyword #define Cross Reference: Annex Annex A Cross Reference: Clause 2.3 Cross Reference: Other Standard ISO 9999-1 {_n} Cross Reference: Section Section 2 Cross Reference: Subclause 2.3.4, 2.3.4.5, 2.3.4.5.6 Defined Term (see text) Environment Variable PATH Error Number [EINTR] Example Input echo foo Example Output foo Figure Reference Figure 7 File Name /tmp Parameter <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e> Special Character Symbolic Constant, Limit {_POSIX_VDISABLE}, {LINE_MAX} Table Reference Table 6 Utility Name awk Utility Operand _f_i_l_e__n_a_m_e Utility Option -c Utility Option with Option-Argument -w _w_i_d_t_h __________________________________________________________________________________________________________________________________________________ Defined terms are shown in three styles, depending on context: (1) Terms defined in 2.2.1, 2.2.2, and 3.1 are expressed as subclause titles. Alternative forms of the terms appear in [brackets]. (2) The initial appearances of other terms, applying to a limited portion of the text, are in _i_t_a_l_i_c_s. (3) Subsequent appearances of the term are in the Roman font. Symbolic constants are shown in two styles: those within curly braces are intended to call the reader's attention to values in and ; those without braces are usually defined by one or a few related functions. There is no semantic difference between these two Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 22 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 forms of presentation. Filenames and pathnames are shown in Courier. When a pathname is shown starting with ``$HOME/'', this indicates the remaining components of the pathname are to be related to the directory named by the user's HOME environment variable. The style selected for some of the special characters, such as , matches the form of the input given to the localedef utility (see 2.5.2). Generally, the characters selected for this special treatment are those that are not visually distinct, such as the control characters or . Literal characters and strings used as input or output are shown in various ways, depending on context: %, begin When no confusion would result, the character or string is rendered in the Courier font and used directly in the text. 'c' In some cases a character is enclosed in single-quote characters, similar to a C-language character constant. Unless otherwise noted, the quotes shall not be used as input or output. "string" In some cases, a string is enclosed in double-quote characters, similar to a C-language string constant. Unless otherwise noted, the quotes shall not be used as input or output. Defined names that are usually in lowercase, particularly function names, are never used at the beginning of a sentence or anywhere else that regular English usage would require them to be capitalized. Parenthetical expressions within normative text also contain normative information. The general typographic hierarchy of parenthetical expressions is: { [ ( ) ] } The square brackets are most frequently used to enclose a parenthetical expression that contains a function name [such as _w_a_i_t_p_i_d()], with its built-in parentheses. In some cases, tabular information is presented inline; in others it is presented in a separately-labeled Table. This arrangement was employed purely for ease of reference and there is no normative difference between these two cases. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.1 Conventions 23 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Annexes marked as _n_o_r_m_a_t_i_v_e are parts of the standard that pose requirements, exactly the same as the numbered Sections, but have been moved to near the end of the document for clarity of exposition. _I_n_f_o_r_m_a_t_i_v_e Annexes are for information only and pose no requirements. All material preceding page 1 of the document (the ``front matter'') and the two indexes at the end are also only informative. NOTES that appear in a smaller point size and are indented have one of two different meanings, depending on their location: - When they are within the normal text of the document, they are the same as footnotes--informative, posing no requirements on implementations or applications. - When they are attached to Tables or Figures, they are normative, posing requirements. Text marked as examples (including the use of ``e.g.'') is for information only. The exception to this comes in the C-language programs and program fragments used to represent algorithms, as described in 2.1.3. The typographical conventions listed here are for ease of reading only. Editorial inconsistencies in the use of typography are unintentional and have no normative meaning in this standard. 2.1.2 Grammar Conventions Portions of this standard are expressed in terms of a special grammar notation. It is used to portray the complex syntax of certain program input. The grammar is based on the syntax used by the yacc utility (see A.3). However, it does not represent fully functional yacc input, suitable for program use: the lexical processing and all semantic requirements are described only in textual form. The grammar is not based on source used in any traditional implementation and has not been tested with the semantic code that would normally be required to accompany it. Furthermore, there is no implication that the partial yacc code presented represents the most efficient, or only, means of supporting the complex syntax within the utility. Implementations may use other programming languages or algorithms, as long as the syntax supported is the same as that represented by the grammar. The following typographical conventions are used in the grammar; they have no significance except to aid in reading. - The identifiers for the reserved words of the language are shown with a leading capital letter. (These are terminals in the grammar. Examples: While, Case.) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 24 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 - The identifiers for terminals in the grammar are all named with 1 uppercase letters and underscores. Examples: NEWLINE, ASSIGN_OP, 1 NAME. 1 - The identifiers for nonterminals are all lowercase. 2.1.3 Miscellaneous Conventions This standard frequently uses the C language to express algorithms in terms of programs or program fragments. The following shall be considered in reading this code: - The programs use the syntax and semantics described by the C Standard {7}. - The programs are merely examples and do not represent the most efficient, or only, means of coding the interface. Implementations may use other programming languages or algorithms, as long as the results are the same as those achieved by the programs in this standard. - C-language comments are informative and pose no requirements. Further conventions are presented in: - Utility Conventions, 2.10, describing utility and application command-line syntax - File Format Notation, 2.12, describing the notation used to represent utility input and output 2.1.4 Conventions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The C language was chosen for many examples because: - It eliminates any requirement to document a different pseudocode. - It is a familiar language to many of the potential readers of POSIX.2. - It is the language most widely used for historical implementations of the utilities. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.1 Conventions 25 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2 Definitions 2.2.1 Terminology For the purposes of this standard, the following definitions apply: 2.2.1.1 can: The word _c_a_n is to be interpreted as describing a permissible optional feature or behavior available to the application; the implementation shall support such features or behaviors as mandatory requirements. 2.2.1.2 conformance document: A document provided by an implementor that contains implementation details as described in 1.3.1.2. 2.2.1.3 implementation: An object providing to applications and users the services defined by this standard. The word _i_m_p_l_e_m_e_n_t_a_t_i_o_n is to be interpreted to mean that object, after it has been modified in accordance with the manufacturer's instructions to: - configure it for conformance with this standard; - select some of the various optional facilities described by this standard, through customization by local system administrators or operators. An exception to this meaning occurs when discussing conformance documentation or using the term _i_m_p_l_e_m_e_n_t_a_t_i_o_n _d_e_f_i_n_e_d. See 2.2.1.4 and 1.3.1.2. 2.2.1.4 implementation defined: When a value or behavior is described by this standard as _i_m_p_l_e_m_e_n_t_a_t_i_o_n _d_e_f_i_n_e_d, the implementation provider shall document the requirements for correct program construction and correct data in the use of that value or behavior. When the value or behavior in the implementation is designed to be variable or customizable on each instantiation of the system, the implementation provider shall document the nature and permissible ranges of this variation. (See 1.3.1.2.) 2.2.1.5 may: The word _m_a_y is to be interpreted as describing an optional feature or behavior of the implementation that is not required by this standard, but there is no prohibition against providing it. A 1 Strictly Conforming POSIX.2 Application is permitted to use such 1 features, but shall not rely on the implementation's actions in such 1 cases. To avoid ambiguity, the reverse sense of _m_a_y is not expressed as 1 _m_a_y _n_o_t, but as _n_e_e_d _n_o_t. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 26 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.1.6 obsolescent: Certain features are _o_b_s_o_l_e_s_c_e_n_t, which means that they may be considered for withdrawal in future revisions of this standard. They are retained in this version because of their widespread use. Their use in new applications is discouraged. 2.2.1.7 shall: In this standard, the word _s_h_a_l_l is to be interpreted as a requirement on the implementation or on Strictly Conforming POSIX.2 Applications, where appropriate. 2.2.1.8 should: With respect to implementations, the word _s_h_o_u_l_d is to be interpreted as an implementation recommendation, but not a requirement. With respect to applications, the word _s_h_o_u_l_d is to be interpreted as recommended programming practice for applications and a requirement for Strictly Conforming POSIX.2 Applications. 2.2.1.9 system documentation: All documentation provided with an implementation, except the conformance document. Electronically distributed documents for an implementation are considered part of the system documentation. 2.2.1.10 undefined: A value or behavior is _u_n_d_e_f_i_n_e_d if the standard imposes no portability requirements on applications for erroneous program construction, erroneous data, or use of an indeterminate value. Implementations (or other standards) may specify the result of using that value or causing that behavior. An application using such behaviors is using extensions, as defined in 1.3.2.3. 2.2.1.11 unspecified: A value or behavior is _u_n_s_p_e_c_i_f_i_e_d if the standard imposes no portability requirements on applications for a correct program construction or correct data. Implementations (or other standards) may specify the result of using that value or causing that behavior. An application requiring a specific behavior, rather than tolerating any behavior when using that functionality, is using extensions, as defined in 1.3.2.3. BEGIN_RATIONALE 2.2.1.12 Terminology Rationale (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Most of these terms were adapted from their POSIX.1 {8} counterparts with little modification. The reader is referred to the definition of _p_r_o_g_r_a_m in 2.2.2.119 to understand the expression ``program construction.'' The use of _p_r_o_g_r_a_m in this standard is differentiated from POSIX.1 {8}'s emphasis only on high level languages by this standard's broader concern with utility and Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 27 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX command language interactions. Included in the scope of program construction are: (1) Shell command language (2) Command arguments (3) Regular expressions, of various types (4) Command input language syntax, such as awk, bc, ed, lex, make, sed, and yacc. Some of these are so complex that they rival traditional high level languages. The usage of _c_a_n and _m_a_y were selected to contrast optional application behavior (can) against optional implementation behavior (may). The term _s_u_p_p_o_r_t_e_d was removed from Draft 8; it had originally been copied from the POSIX.1 {8} document, but it later became clear that its requirement for function ``stubs'' for unsupported functions made little sense in this standard. The term _s_u_p_p_o_r_t therefore reverts to its English-language meaning. The term _o_b_s_o_l_e_s_c_e_n_t was changed to _d_e_p_r_e_c_a_t_e_d in some earlier drafts, but it was restored to match POSIX.1 {8}'s use of the term. It means ``do not use this feature in new applications.'' The obsolescence concept is not an ideal solution, but was used as a method of increasing consensus: many more objections would be heard from the user community if some of these historical features were suddenly withdrawn without the grace period obsolescence implies. The phrase ``may be considered for withdrawal in future revisions'' implies that the result of that consideration might in fact keep those features indefinitely if the predominance of applications does not migrate away from them quickly. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 28 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2 General Terms For the purposes of this standard, the following definitions apply. 2.2.2.1 absolute pathname: See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. 2.2.2.2 address space: The memory locations that can be referenced by a process. [POSIX.1 {8}] 2.2.2.3 affirmative response: An input string that matches one of the responses acceptable to the LC_MESSAGES category keyword yesexpr, matching an extended regular expression in the current locale; see 2.5. 2.2.2.4 : A character that in the output stream shall indicate 1 that a terminal should alert its user via a visual or audible 1 notification. The shall be the character designated by '\a' in the C language binding. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the alert function. 2.2.2.5 angle brackets: The characters ``<'' (_l_e_f_t-_a_n_g_l_e-_b_r_a_c_k_e_t) and ``>'' (_r_i_g_h_t-_a_n_g_l_e-_b_r_a_c_k_e_t). When used in the phrase ``enclosed in angle brackets'' the symbol ``<'' shall immediately precede the object to be enclosed, and ``>'' shall immediately follow it. When describing these characters in 2.4, the names and are used. 2.2.2.6 appropriate privileges: An implementation-defined means of associating privileges with a process with regard to the function calls and function call options defined in POSIX.1 {8} that need special privileges. There may be zero or more such means. [POSIX.1 {8}] 2.2.2.7 argument: A parameter passed to a utility as the equivalent of a single string in the _a_r_g_v array created by one of the POSIX.1 {8} _e_x_e_c functions. See 2.10.1 and 3.9.1.1. An argument is one of the options, option- arguments, or operands following the command name. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 29 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.8 asterisk: The character ``*''. 2.2.2.9 background process: A process that is a member of a background process group. [POSIX.1 {8}] 2.2.2.10 background process group: Any process group, other than a foreground process group, that is a member of a session that has established a connection with a controlling terminal. [POSIX.1 {8}] 2.2.2.11 backquote: The character ```'', also known as a _g_r_a_v_e _a_c_c_e_n_t. 2.2.2.12 backslash: The character ``\'', also known as a _r_e_v_e_r_s_e _s_o_l_i_d_u_s. 2.2.2.13 : A character that normally causes printing (or displaying) to occur one column position previous to the position about to be printed. The shall be the character designated by '\b' in the C language binding. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the backspace function. The character defined here is not necessarily the ERASE special character defined in POSIX.1 {8} 7.1.1.9. 2.2.2.14 basename: The final, or only, filename in a pathname. 2.2.2.15 basic regular expression: A pattern (sequence of characters or symbols) constructed according to the rules defined in 2.8.3. 2.2.2.16 : One of the characters that belong to the blank character class as defined via the LC_CTYPE category in the current locale. In the POSIX Locale, a is either a or a . 2.2.2.17 blank line: A line consisting solely of zero or more s terminated by a . See also _e_m_p_t_y _l_i_n_e (2.2.2.44). 2.2.2.18 block special file: A file that refers to a device. A block special file is normally distinguished from a character special file by providing access to the device in a manner such that the hardware Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 30 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 characteristics of the device are not visible. [POSIX.1 {8}] 2.2.2.19 braces: The characters ``{'' (_l_e_f_t _b_r_a_c_e) and ``}'' (_r_i_g_h_t _b_r_a_c_e), also known as _c_u_r_l_y _b_r_a_c_e_s. When used in the phrase ``enclosed in (curly) braces'' the symbol ``{'' shall immediately precede the object to be enclosed, and ``}'' shall immediately follow it. When describing these characters in 2.4, the names and are used. 2.2.2.20 brackets: The characters ``['' (_l_e_f_t-_b_r_a_c_k_e_t) and ``]'' (_r_i_g_h_t-_b_r_a_c_k_e_t), also known as _s_q_u_a_r_e _b_r_a_c_k_e_t_s. When used in the phrase ``enclosed in (square) brackets'' the symbol ``['' shall immediately precede the object to be enclosed, and ``]'' shall immediately follow it. When describing these characters in 2.4, the names and are used. 2.2.2.21 built-in utility: A utility implemented within a shell. The utilities referred to as _s_p_e_c_i_a_l _b_u_i_l_t-_i_n_s have special qualities, described in 3.14. Unless qualified, the term _b_u_i_l_t-_i_n includes the special built-in utilities. The utilities referred to as _r_e_g_u_l_a_r _b_u_i_l_t-_i_n_s are those named in Table 2-2. As indicated in 2.3, there is no requirement that these utilities be actually built into the shell on the implementation, but that they do have special command-search qualities. 2.2.2.22 byte: An individually addressable unit of data storage that is 1 equal to or larger than an octet, used to store a character or a portion 1 of a character; see 2.2.2.24. 1 A byte is composed of a contiguous sequence of bits, the number of which 1 is implementation defined. The least significant bit is called the _l_o_w- _o_r_d_e_r bit; the most significant is called the _h_i_g_h-_o_r_d_e_r bit. [POSIX.1 {8}] NOTE: This definition of _b_y_t_e is actually from the C Standard {7} because POSIX.1 {8} merely references it without copying the text. It 1 has been reworded slightly to clarify its intent without introducing the 1 C Standard {7} terminology ``basic execution character set,'' which is 1 inapplicable to this standard. It deviates intentionally from the usage 1 of _b_y_t_e in some other standards, where it is used as a synonym for _o_c_t_e_t 1 (always eight bits). On a POSIX.1 {8} system, a byte may be larger than 1 eight bits so that it can be an integral portion of larger data objects 1 that are not evenly divisible by eight bits (such as a 36-bit word that 1 contains 4 9-bit bytes). 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 31 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.23 : A character that in the output stream shall 1 indicate that printing should start at the beginning of the same physical line in which the occurred. The shall be the character designated by '\r' in the C language binding. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the beginning of the line. 2.2.2.24 character: A sequence of one or more bytes representing a single graphic symbol. NOTE: This term corresponds in the C Standard {7} to the term _m_u_l_t_i_b_y_t_e _c_h_a_r_a_c_t_e_r, noting that a single-byte character is a special case of multibyte character. Unlike the usage in the C Standard {7}, _c_h_a_r_a_c_t_e_r here has no necessary relationship with storage space, and _b_y_t_e is used when storage space is discussed. [POSIX.1 {8}] (See 2.4 for a further explanation of the graphical representations of characters, or ``glyphs,'' versus character encodings.) 2.2.2.25 character class: A named set of characters sharing an attribute associated with the name of the class. The classes and the characters that they contain are dependent on the value of the LC_CTYPE category in the current locale; see 2.5. 2.2.2.26 character special file: A file that refers to a device. One specific type of character special file is a terminal device file, whose access is defined in POSIX.1 {8} section 7.1. Other character special files have no structure defined by this standard, and their use is unspecified by this standard. [POSIX.1 {8}] 2.2.2.27 circumflex: The character ``^''. 2.2.2.28 collating element: The smallest entity used to determine the logical ordering of strings. See _c_o_l_l_a_t_i_o_n _s_e_q_u_e_n_c_e (2.2.2.30). A collating element shall consist of either a single character, or two or more characters collating as a single entity. The value of the LC_COLLATE category in the current locale determines the current set of collating elements. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 32 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.29 collation: The logical ordering of strings according to defined precedence rules. These rules identify a collation sequence between the collating elements, and such additional rules that can be used to order strings consisting of multiple collating elements. 2.2.2.30 collation sequence: The relative order of collating elements as determined by the setting of the LC_COLLATE category in the current locale. The character order, as defined for the LC_COLLATE category in the 2 current locale (see 2.5.2.2), defines the relative order of all collating 2 elements, such that each element occupies a unique position in the order. 2 In addition, one or more collation weights can be assigned for each 2 collating element; these weights are used to determine the relative order 2 of strings in, e.g., the sort utility. 2 Multilevel sorting is accomplished by assigning elements one or more collation weights, up to the limit {COLL_WEIGHTS_MAX}. On each level, elements may be given the same weight (at the primary level, called an 1 _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s; see 2.2.2.47) or be omitted from the sequence. Strings that collate equal using the first assigned weight (primary ordering), are then compared using the next assigned weight (secondary ordering), and so on. 2.2.2.31 column position: A unit of horizontal measure related to characters in a line. 2 It is assumed that each character in a character set has an intrinsic 2 column width independent of any output device. Each printable character 2 in the portable character set has a column width of one. The standard 2 utilities, when used as described in this standard, assume that all 2 characters have integral column widths. The column width of a character 2 is not necessarily related to the internal representation of the 2 character (numbers of bits or octets). 2 The column position of a character in a line is defined as one plus the 2 sum of the column widths of the preceding characters in the line. Column 2 positions are numbered starting from 1. 2.2.2.32 command: A directive to the shell to perform a particular task; see 3.9. 2.2.2.33 current working directory: See _w_o_r_k_i_n_g _d_i_r_e_c_t_o_r_y in 2.2.2.159. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 33 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.34 command language interpreter: See 2.2.2.133. 2.2.2.35 directory: A file that contains directory entries. No two directory entries in the same directory shall have the same name. [POSIX.1 {8}] 2.2.2.36 directory entry [link]: An object that associates a filename with a file. Several directory entries can associate names with the same file. [POSIX.1 {8}] 2.2.2.37 dollar-sign: The character ``$''. This standard permits the substitution of the ``currency symbol'' graphic defined in ISO/IEC 646 {1} for this symbol when the character set being used has substituted that graphic for the graphic $. The graphic symbol $ is always used in this standard, but not in any monetary sense. 2.2.2.38 dot: The filename consisting of a single dot character (.). See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}] In the context of shell special built-in utilities, see 3.14.4. 2.2.2.39 dot-dot: The filename consisting solely of two dot characters (..). See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}] 2.2.2.40 double-quote: The character ``"'', also known as _q_u_o_t_a_t_i_o_n- _m_a_r_k. 2.2.2.41 effective group ID: An attribute of a process that is used in determining various permissions, including file access permissions, described in 2.2.2.55. See _g_r_o_u_p _I_D. This value is subject to change during the process lifetime, as described in POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_g_i_d()]. [POSIX.1 {8}] 2.2.2.42 effective user ID: An attribute of a process that is used in determining various permissions, including file access permissions. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 34 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 See _u_s_e_r _I_D. This value is subject to change during the process lifetime, as described in POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_u_i_d()]. [POSIX.1 {8}] 2.2.2.43 empty directory: A directory that contains, at most, directory entries for dot and dot-dot. [POSIX.1 {8}] 2.2.2.44 empty line: A line consisting of only a character. See also _b_l_a_n_k _l_i_n_e (2.2.2.17). 2.2.2.45 empty string [null string]: A character array whose first element is a null character. [POSIX.1 {8}] 2.2.2.46 Epoch: The time 0 hours, 0 minutes, 0 seconds, January 1, 1970, Coordinated Universal Time. See _s_e_c_o_n_d_s _s_i_n_c_e _t_h_e _E_p_o_c_h. [POSIX.1 {8}] 2.2.2.47 equivalence class: A set of collating elements with the same 1 primary collation weight. 1 Elements in an equivalence class are typically elements that naturally group together, such as all accented letters based on the same base letter. The collation order of elements within an equivalence class is determined 1 by the weights assigned on any subsequent levels after the primary 1 weight. 1 2.2.2.48 executable file: A regular file acceptable as a new process image file by the equivalent of the POSIX.1 {8} _e_x_e_c family of functions, and thus usable as one form of a utility. See _e_x_e_c in POSIX.1 {8} 3.1.2. The standard utilities described as compilers can produce executable files, but other unspecified methods of producing executable files may also be provided. The internal format of an executable file is unspecified, but a conforming application shall not assume an executable file is a text file. 2.2.2.49 execute: To perform the actions described in 3.9.1.1. See also _i_n_v_o_k_e (2.2.2.79). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 35 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.50 extended regular expression: A pattern (sequence of characters or symbols) constructed according to the rules defined in 2.8.4. 2.2.2.51 extended security controls: A concept of the underlying system, as follows. [POSIX.1 {8}] The access control (see _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s) and privilege (see _a_p_p_r_o_p_r_i_a_t_e _p_r_i_v_i_l_e_g_e_s in 2.2.2.6) mechanisms have been defined to allow implementation-defined extended security controls. These permit an implementation to provide security mechanisms to implement different security policies than described in POSIX.1 {8}. These mechanisms shall not alter or override the defined semantics of any of the functions in POSIX.1 {8}. 2.2.2.52 feature test macro: A #defined symbol used to determine whether a particular set of features will be included from a header. See POSIX.1 {8} 2.7.1. [POSIX.1 {8}] 2.2.2.53 FIFO special file [FIFO]: A type of file with the property that data written to such a file is read on a first-in-first-out basis. Other characteristics of _F_I_F_Os are described in POSIX.1 {8} 5.3.1 [_o_p_e_n()], 6.4.1 [_r_e_a_d()], 6.4.2 [_w_r_i_t_e()], and 6.5.3 [_l_s_e_e_k()]. [POSIX.1 {8}] 2.2.2.54 file: An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, and directory. Other types of files may be defined by the implementation. [POSIX.1 {8}] 2.2.2.55 file access permissions: A concept of the underlying system, as follows. [POSIX.1 {8}] The standard file access control mechanism uses the file permission bits, as described below. These bits are set at file creation by _o_p_e_n(), _c_r_e_a_t(), _m_k_d_i_r(), and _m_k_f_i_f_o() and are changed by _c_h_m_o_d(). These bits are read by _s_t_a_t() or _f_s_t_a_t(). Implementations may provide _a_d_d_i_t_i_o_n_a_l or _a_l_t_e_r_n_a_t_e file access control mechanisms, or both. An additional access control mechanism shall only further restrict the access permissions defined by the file permission bits. An alternate access control mechanism shall: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 36 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 (1) Specify file permission bits for the file owner class, file group class, and file other class of the file, corresponding to the access permissions, to be returned by _s_t_a_t() or _f_s_t_a_t(). (2) Be enabled only by explicit user action, on a per-file basis by the file owner or a user with the appropriate privilege. (3) Be disabled for a file after the file permission bits are changed for that file with _c_h_m_o_d(). The disabling of the alternate mechanism need not disable any additional mechanisms defined by an implementation. Whenever a process requests file access permission for read, write, or execute/search, if no additional mechanism denies access, access is determined as follows: (1) If a process has the appropriate privilege: (a) If read, write, or directory search permission is requested, access is granted. (b) If execute permission is requested, access is granted if execute permission is granted to at least one user by the file permission bits or by an alternate access control mechanism; otherwise, access is denied. (2) Otherwise: (a) The file permission bits of a file contain read, write, and execute/search permissions for the file owner class, file group class, and file other class. (b) Access is granted if an alternate access control mechanism is not enabled and the requested access permission bit is set for the class (file owner class, file group class, or file other class) to which the process belongs, or if an alternate access control mechanism is enabled and it allows the requested access; otherwise, access is denied. 2.2.2.56 file descriptor: A per-process unique, nonnegative integer used to identify an open file for the purpose of file access. [POSIX.1 {8}] 2.2.2.57 file group class: The property of a file indicating access permissions for a process related to the process's group identification. A process is in the file group class of a file if the process is not in the file owner class and if the effective group ID or one of the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 37 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX supplementary group IDs of the process matches the group ID associated with the file. Other members of the class may be implementation defined. [POSIX.1 {8}] 2.2.2.58 file hierarchy: A concept of the underlying system, as follows. [POSIX.1 {8}] Files in the system are organized in a hierarchical structure in which all of the nonterminal nodes are directories and all of the terminal nodes are any other type of file. Because multiple directory entries may refer to the same file, the hierarchy is properly described as a ``directed graph.'' 2.2.2.59 file mode: An object containing the file permission bits and other characteristics of a file, as described in POSIX.1 {8} 5.6.1. [POSIX.1 {8}] 2.2.2.60 file mode bits: A file's file permission bits, set-user-ID- on-execution bit (S_ISUID), and set-group-ID-on-execution bit (S_ISGID) (see POSIX.1 {8} 5.6.1.2). 2.2.2.61 filename: A name consisting of 1 to {NAME_MAX} bytes used to name a file. The characters composing the name may be selected from the set of all character values excluding the slash character and the null character. The filenames dot and dot-dot have special meaning; see _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. A filename is sometimes referred to as a pathname component. [POSIX.1 {8}] 2.2.2.62 filename portability: A concept of the underlying system, as follows. [POSIX.1 {8}] Filenames should be constructed from the portable filename character set because the use of other characters can be confusing or ambiguous in certain contexts. 2.2.2.63 file offset: The byte position in the file where the next I/O operation begins. Each open file description associated with a regular file, block special file, or directory has a file offset. A character special file that does not refer to a terminal device may have a file offset. There is no file offset specified for a pipe or FIFO. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 38 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.64 file other class: The property of a file indicating access permissions for a process related to the process's user and group identification. A process is in the file other class of a file if the process is not in the file owner class or file group class. [POSIX.1 {8}] 2.2.2.65 file owner class: The property of a file indicating access permissions for a process related to the process's user identification. A process is in the file owner class of a file if the effective user ID of the process matches the user ID of the file. [POSIX.1 {8}] 2.2.2.66 file permission bits: Information about a file that is used, along with other information, to determine if a process has read, write, or execute/search permission to a file. The bits are divided into three parts: owner, group, and other. Each part is used with the corresponding file class of processes. These bits are contained in the file mode, as described in POSIX.1 {8} 5.6.1. The detailed usage of the file permission bits in access decisions is described in _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55. [POSIX.1 {8}] 2.2.2.67 file serial number: A per-file-system unique identifier for a file. File serial numbers are unique throughout a file system. [POSIX.1 {8}] 2.2.2.68 file system: A collection of files and certain of their attributes. It provides a name space for file serial numbers referring to those files. [POSIX.1 {8}] 2.2.2.69 file times update: A concept of the underlying system, as follows. [POSIX.1 {8}] Each file has three distinct associated time values: _s_t__a_t_i_m_e, _s_t__m_t_i_m_e, and _s_t__c_t_i_m_e. The _s_t__a_t_i_m_e field is associated with the times that the file data is accessed; _s_t__m_t_i_m_e is associated with the times that the file data is modified; and _s_t__c_t_i_m_e is associated with the times that file status is changed. These values are returned in the file characteristics structure, as described in POSIX.1 {8} 5.6.1. Any function in this standard that is required to read or write file data or change the file status indicates which of the appropriate time-related fields are to be ``marked for update.'' If an implementation of such a Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 39 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX function marks for update a time-related field not specified by this standard, this shall be documented, except that any changes caused by pathname resolution need not be documented. For the other functions in this standard (those that are not explicitly required to read or write file data or change file status, but that in some implementations happen to do so), the effect is unspecified. An implementation may update fields that are marked for update immediately, or it may update such fields periodically. When the fields are updated, they are set to the current time and the update marks are cleared. All fields that are marked for update shall be updated when the file is no longer open by any process, or when a _s_t_a_t() or _f_s_t_a_t() is performed on the file. Other times at which updates are done are unspecified. Updates are not done for files on read-only file systems. 2.2.2.70 file type: See _f_i_l_e in 2.2.2.54. 2.2.2.71 filter: A command whose operation consists of reading data from standard input or a list of input files and writing data to standard output. Typically, its function is to perform some transformation on the data stream. 2.2.2.72 foreground process: A process that is a member of a foreground process group. [POSIX.1 {8}] 2.2.2.73 foreground process group: A process group whose member processes have certain privileges, denied to processes in background process groups, when accessing their controlling terminal. Each session that has established a connection with a controlling terminal has exactly one process group of the session as the foreground process group of that controlling terminal. See POSIX.1 {8} 7.1.1.4. [POSIX.1 {8}] 2.2.2.74 : A character that in the output stream shall 1 indicate that printing should start on the next page of an output device. The shall be the character designated by '\f' in the C language binding. If is not the first character of an output line, the result is unspecified. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the next page. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 40 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.75 group ID: A nonnegative integer, which can be contained in an object of type _g_i_d__t, that is used to identify a group of system users. Each system user is a member of at least one group. When the identity of a group is associated with a process, a group ID value is referred to as a real group ID, an effective group ID, one of the (optional) supplementary group IDs, or an (optional) saved set-group-ID. [POSIX.1 {8}] 2.2.2.76 hard link: The relationship between two directory entries that represent the same file; the result of an execution of the ln utility or the POSIX.1 {8} _l_i_n_k() function. 2.2.2.77 home directory: The current directory associated with a user at the time of login. 2.2.2.78 incomplete line: A sequence of text consisting of one or more non- characters at the end of the file. 2.2.2.79 invoke: To perform the actions described in 3.9.1.1, except that searching for shell functions and special built-ins is suppressed. See also _e_x_e_c_u_t_e (2.2.2.49). 2.2.2.80 job control: A facility that allows users to selectively stop (suspend) the execution of processes and continue (resume) their execution at a later point. The user typically employs this facility via the interactive interface jointly supplied by the terminal I/O driver and a command interpreter. POSIX.1 {8} conforming implementations may optionally support job control facilities; the presence of this option is indicated to the application at compile time or run time by the definition of the {_POSIX_JOB_CONTROL} symbol; see POSIX.1 {8} 2.9. [POSIX.1 {8}] 2.2.2.81 line: A sequence of text consisting of zero or more non- characters plus a terminating character. 2.2.2.82 link: See _d_i_r_e_c_t_o_r_y _e_n_t_r_y in 2.2.2.36. 2.2.2.83 link count: The number of directory entries that refer to a particular file. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 41 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.84 locale: The definition of the subset of a user's environment that depends on language and cultural conventions; see 2.5. 2.2.2.85 login: The unspecified activity by which a user gains access to the system. Each login shall be associated with exactly one login name. [POSIX.1 {8}] 2.2.2.86 login name: A user name that is associated with a login. [POSIX.1 {8}] 2.2.2.87 mode: A collection of attributes that specifies a file's type and its access permissions. See _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55. [POSIX.1 {8}] 2.2.2.88 multicharacter collating element: A sequence of two or more characters that collate as an entity. For example, in some coded character sets, an accented character is represented by a (nonspacing) accent, followed by the letter. Another example is the Spanish elements ``ch'' and ``ll.'' 2.2.2.89 negative response: An input string that matches one of the responses acceptable to the LC_MESSAGES category keyword noexpr, matching an extended regular expression in the current locale. See 2.5. 2.2.2.90 : A character that in the output stream shall 1 indicate that printing should start at the beginning of the next line. The shall be the character designated by '\n' in the C language binding. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the next line. 2.2.2.91 NUL: A character with all bits set to zero. 2.2.2.92 null string: See _e_m_p_t_y _s_t_r_i_n_g in 2.2.2.45. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 42 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.93 number-sign: The character ``#''. This standard permits the substitution of the ``pound sign'' graphic defined in ISO/IEC 646 {1} for this symbol when the character set being used has substituted that graphic for the graphic #. The graphic symbol # is always used in this standard. 2.2.2.94 object file: A regular file containing the output of a compiler, formatted as input to a linkage editor for linking with other object files into an executable form. The methods of linking are unspecified and may involve the dynamic linking of objects at run-time. The internal format of an object file is unspecified, but a conforming application shall not assume an object file is a text file. 2.2.2.95 open file: A file that is currently associated with a file descriptor. [POSIX.1 {8}] 2.2.2.96 operand: An argument to a command that is generally used as an object supplying information to a utility necessary to complete its processing. Operands generally follow the options in a command line. See 2.10.1. 2.2.2.97 option: An argument to a command that is generally used to specify changes in the _u_t_i_l_i_t_y's default behavior; see 2.10.1. 2.2.2.98 option-argument: A parameter that follows certain options. In some cases an option-argument is included within the same argument string as the option; in most cases it is the next argument. See 2.10.1. 2.2.2.99 parent directory: (1) When discussing a given directory, the directory that both contains a directory entry for the given directory and is represented by the pathname dot-dot in the given directory. (2) When discussing other types of files, a directory containing a directory entry for the file under discussion. This concept does not apply to dot and dot-dot. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 43 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.100 parent process: See _p_r_o_c_e_s_s in 2.2.2.114. [POSIX.1 {8}] 2.2.2.101 parent process ID: An attribute of a new process after it is created by a currently active process. The parent process ID of a process is the process ID of its creator, for the lifetime of the creator. After the creator's lifetime has ended, the parent process ID is the process ID of an implementation-defined system process. [POSIX.1 {8}] 2.2.2.102 pathname: A string that is used to identify a file. A pathname consists of, at most, {PATH_MAX} bytes, including the terminating null character. It has an optional beginning slash, followed by zero or more filenames separated by slashes. If the pathname refers to a directory, it may also have one or more trailing slashes. Multiple successive slashes are considered to be the same as one slash. A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash. The interpretation of the pathname is described in _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}] 2.2.2.103 pathname component: See _f_i_l_e_n_a_m_e in 2.2.2.61. [POSIX.1 {8}] 2.2.2.104 pathname resolution: A concept of the underlying system, as follows. [POSIX.1 {8}] Pathname resolution is performed for a process to resolve a pathname to a particular file in a file hierarchy. There may be multiple pathnames that resolve to the same file. Each filename in the pathname is located in the directory specified by its predecessor (for example, in the pathname fragment ``a/b'', file ``b'' is located in directory ``a''). Pathname resolution fails if this cannot be accomplished. If the pathname begins with a slash, the predecessor of the first filename in the pathname is taken to be the root directory of the process (such pathnames are referred to as absolute pathnames). If the pathname does not begin with a slash, the predecessor of the first filename of the pathname is taken to be the current working directory of the process (such pathnames are referred to as ``relative pathnames''). The interpretation of a pathname component is dependent on the values of {NAME_MAX} and {_POSIX_NO_TRUNC} associated with the path prefix of that component. If any pathname component is longer than {NAME_MAX}, and {_POSIX_NO_TRUNC} is in effect for the path prefix of that component [see _p_a_t_h_c_o_n_f() in POSIX.1 {8} 5.7.1], the implementation shall consider this Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 44 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 an error condition. Otherwise, the implementation shall use the first {NAME_MAX} bytes of the pathname component. The special filename dot refers to the directory specified by its predecessor. The special filename dot-dot refers to the parent directory of its predecessor directory. As a special case, in the root directory, dot-dot may refer to the root directory itself. A pathname consisting of a single slash resolves to the root directory of the process. A null pathname is invalid. 2.2.2.105 path prefix: A pathname, with an optional ending slash, that refers to a directory. [POSIX.1 {8}] 2.2.2.106 pattern: A sequence of characters used either with regular expression notation (see 2.8) or for pathname expansion (see 3.6.6), as a means of selecting various character strings or pathnames, respectively. The syntaxes of the two patterns are similar, but not identical; this standard always indicates the type of pattern being referred to in the immediate context of the use of the term. 2.2.2.107 period: The character ``.''. The term _p_e_r_i_o_d is contrasted against _d_o_t (2.2.2.38), which is used to describe a specific directory entry. 2.2.2.108 permissions: See _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55. 2.2.2.109 pipe: An object accessed by one of the pair of file descriptors created by the POSIX.1 {8} _p_i_p_e() function. Once created, the file descriptors can be used to manipulate it, and it behaves identically to a FIFO special file when accessed in this way. It has no name in the file hierarchy. [POSIX.1 {8}] 2.2.2.110 portable character set: The set of characters described in 2.4 that is supported on all conforming systems. This term is contrasted against the smaller _p_o_r_t_a_b_l_e _f_i_l_e_n_a_m_e _c_h_a_r_a_c_t_e_r _s_e_t; see 2.2.2.111. 2.2.2.111 portable filename character set: The set of characters from which portable filenames are constructed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 45 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX For a filename to be portable across conforming implementations of this standard, it shall consist only of the following characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ - The last three characters are the period, underscore, and hyphen characters, respectively. The hyphen shall not be used as the first character of a portable filename. Upper- and lowercase letters shall retain their unique identities between conforming implementations. In the case of a portable pathname, the slash character may also be used. [POSIX.1 {8}] 2.2.2.112 printable character: One of the characters included in the print character classification of the LC_CTYPE category in the current locale; see 2.5.2.1. 2.2.2.113 privilege: See _a_p_p_r_o_p_r_i_a_t_e _p_r_i_v_i_l_e_g_e_s in 2.2.2.6. [POSIX.1 {8}] 2.2.2.114 process: An address space and single thread of control that executes within that address space, and its required system resources. A process is created by another process issuing the POSIX.1 {8} _f_o_r_k() function. The process that issues _f_o_r_k() is known as the parent process, and the new process created by the _f_o_r_k() is known as the child process. [POSIX.1 {8}] The attributes of processes required by POSIX.2 form a subset of those in POSIX.1 {8}; see 2.9.1. 2.2.2.115 process group: A collection of processes that permits the signaling of related processes. Each process in the system is a member of a process group that is identified by a process group ID. A newly created process joins the process group of its creator. [POSIX.1 {8}] 2.2.2.116 process group ID: The unique identifier representing a process group during its lifetime. A process group ID is a positive integer that can be contained in a _p_i_d__t. It shall not be reused by the system until the process group lifetime ends. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 46 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.117 process group leader: A process whose process ID is the same as its process group ID. [POSIX.1 {8}] 2.2.2.118 process ID: The unique identifier representing a process. A process ID is a positive integer that can be contained in a _p_i_d__t. A process ID shall not be reused by the system until the process lifetime ends. In addition, if there exists a process group whose process group ID is equal to that process ID, the process ID shall not be reused by the system until the process group lifetime ends. A process that is not a system process shall not have a process ID of 1. [POSIX.1 {8}] 2.2.2.119 program: A prepared sequence of instructions to the system to accomplish a defined task. The term _p_r_o_g_r_a_m in POSIX.2 encompasses applications written in the Shell Command Language, complex utility input languages (for example, awk, lex, sed, etc.), and high-level languages. 2.2.2.120 read-only file system: A file system that has implementation-defined characteristics restricting modifications. [POSIX.1 {8}] 2.2.2.121 real group ID: The attribute of a process that, at the time of process creation, identifies the group of the user who created the process. See _g_r_o_u_p _I_D in 2.2.2.75. This value is subject to change during the process lifetime, as described in POSIX.1 {8} 4.2.2 [_s_e_t_g_i_d()]. [POSIX.1 {8}] 2.2.2.122 real user ID: The attribute of a process that, at the time of process creation, identifies the user who created the process. See _u_s_e_r _I_D in 2.2.2.154. This value is subject to change during the process lifetime, as described in POSIX.1 {8} 4.2.2 [_s_e_t_u_i_d()]. [POSIX.1 {8}] 2.2.2.123 regular expression: A pattern (sequence of characters or 1 symbols) constructed according to the rules defined in 2.8. 1 2.2.2.124 regular file: A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 47 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.2.2.125 relative pathname: See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}] 2.2.2.126 root directory: A directory, associated with a process, that is used in pathname resolution for pathnames that begin with a slash. [POSIX.1 {8}] 2.2.2.127 saved set-group-ID: An attribute of a process that allows some flexibility in the assignment of the effective group ID attribute, when the saved set-user-ID option is implemented, as described in POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_g_i_d()]. [POSIX.1 {8}] 2.2.2.128 saved set-user-ID: An attribute of a process that allows some flexibility in the assignment of the effective user ID attribute, when the saved set-user-ID option is implemented, as described in POSIX.1 {8} 3.1.2 and 4.2.2 [_s_e_t_u_i_d()]. [POSIX.1 {8}] 2.2.2.129 seconds since the Epoch: A value to be interpreted as the number of seconds between a specified time and the Epoch. A Coordinated Universal Time name [specified in terms of seconds (_t_m__s_e_c), minutes (_t_m__m_i_n), hours (_t_m__h_o_u_r), days since January 1 of the year (_t_m__y_d_a_y), and calendar year minus 1900 (_t_m__y_e_a_r)] is related to a time represented as seconds since the Epoch, according to the expression below. If the year < 1970 or the value is negative, the relationship is undefined. If the year _> 1970 and the value is nonnegative, the value is related to a Coordinated Universal Time name according to the expression: _t_m__s_e_c + _t_m__m_i_n*60 + _t_m__h_o_u_r*3600 + _t_m__y_d_a_y*86400 + (_t_m__y_e_a_r-70)*31536000 + ((_t_m__y_e_a_r-69)/4)*86400 [POSIX.1 {8}] 2.2.2.130 session: A collection of process groups established for job control purposes. Each process group is a member of a session. A process is considered to be a member of the session of which its process group is a member. A newly created process joins the session of its creator. A process can alter its session membership (see POSIX.1 {8} 4.3.2 [_s_e_t_s_i_d()]. Implementations that support the POSIX.1 {8} _s_e_t_p_g_i_d() function (see POSIX.1 {8} 4.3.3) can have multiple process groups in the same session. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 48 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.131 session leader: A process that has created a session; see POSIX.1 {8} 4.3.2 [_s_e_t_s_i_d()]. [POSIX.1 {8}] 2.2.2.132 session lifetime: The period between when a session is created and the end of the lifetime of all the process groups that remain as members of the session. [POSIX.1 {8}] 2.2.2.133 shell: A program that interprets sequences of text input as commands. It may operate on an input stream or it may interactively prompt and read commands from a terminal. 2.2.2.134 Shell, The: The Shell Command Language Interpreter (see 4.56), a specific instance of a shell. 2.2.2.135 shell script: A file containing shell commands. If the file is made executable, it can be executed by specifying its name as a simple command (see the description of _s_i_m_p_l_e _c_o_m_m_a_n_d in 3.9.1). Execution of a shell script causes a shell to execute the commands within the script. Alternately, a shell can be requested to execute the commands in a shell script by specifying the name of the shell script as the operand to the sh utility. 2.2.2.136 signal: A mechanism by which a process may be notified of, or affected by, an event occurring in the system. Examples of such events include hardware exceptions and specific actions by processes. The term _s_i_g_n_a_l is also used to refer to the event itself. [POSIX.1 {8}] 2.2.2.137 single-quote: The character ``''', also known as _a_p_o_s_t_r_o_p_h_e. 2.2.2.138 slash: The character ``/'', also known as _s_o_l_i_d_u_s. 2.2.2.139 source code: When dealing with the Shell Command Language, source code is input to the command language interpreter. The term _s_h_e_l_l _s_c_r_i_p_t is synonymous with this meaning. When dealing with the C Language Bindings Option, source code is input to a C compiler conforming to the C Standard {7}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 49 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX When dealing with another ISO/IEC conforming language, source code is input to a compiler conforming to that ISO/IEC standard. Source code also refers to the input statements prepared for the following standard utilities: awk, bc, ed, lex, localedef, make, sed, and yacc. Source code can also refer to a collection of sources meeting any or all of these meanings. _2._2._2._1_4_0 : The character defined in 2.4 as . The character is a member of the space character class of the current locale, but represents the single character, and not all of the possible members of the class. (See 2.2.2.158.) 2.2.2.141 standard error: An output stream usually intended to be used for diagnostic messages. 2.2.2.142 standard input: An input stream usually intended to be used for primary data input. 2.2.2.143 standard output: An output stream usually intended to be used for primary data output. 2.2.2.144 standard utilities: The utilities defined by this standard, in the Sections 4, 5, and 6, and Annex A, and Annex C, and in similar sections of utility definitions introduced in future revisions of, and supplements to, this standard. 2.2.2.145 stream: An ordered sequence of characters, as described by the C Standard {7}. 2.2.2.146 supplementary group ID: An attribute of a process used in determining file access permissions. A process has up to {NGROUPS_MAX} supplementary group IDs in addition to the effective group ID. The supplementary group IDs of a process are set to the supplementary group IDs of the parent process when the process is created. Whether a process's effective group ID is included in or omitted from its list of supplementary group IDs is unspecified. [POSIX.1 {8}] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 50 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.2.147 system: An implementation of this standard. 2.2.2.148 : The horizontal tab character. 2.2.2.149 terminal [terminal device]: A character special file that obeys the specifications of the POSIX.1 {8} General Terminal Interface. [POSIX.1 {8}] 2.2.2.150 text column: A roughly rectangular block of characters capable of being laid out side-by-side next to other text columns on an output page or terminal screen. The widths of text columns are measured in column positions. 2.2.2.151 text file: A file that contains characters organized into one or more lines. The lines shall not contain NUL characters and none shall exceed {LINE_MAX} bytes in length, including the . Although POSIX.1 {8} does not distinguish between text files and binary files (see the C Standard {7}), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify _t_e_x_t _f_i_l_e_s in their Standard Input or Input Files subclauses. 2.2.2.152 tilde: The character ``~''. 2.2.2.153 user database: See Section 9 in POSIX.1 {8}. 2.2.2.154 user ID: A nonnegative integer, which can be contained in an object of type _u_i_d__t, that is used to identify a system user. When the identity of a user is associated with a process, a user ID value is referred to as a real user ID, an effective user ID, or an (optional) saved set-user-ID. [POSIX.1 {8}] 2.2.2.155 user name: A string that is used to identify a user, as described in POSIX.1 {8} 9.1. [POSIX.1 {8}] 2.2.2.156 utility: A program that can be called by name from a shell to perform a specific task, or related set of tasks. This program shall either be an executable file, such as might be produced by a compiler/linker system from computer source code, or a file of shell source code, directly interpreted by the shell. The program may Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 51 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX have been produced by the user, provided by the implementor of this standard, or acquired from an independent distributor. The term _u_t_i_l_i_t_y does not apply to the special built-in utilities provided as part of the shell command language; see 3.14. The system may implement certain utilities as shell functions (see 3.9.5) or built-ins (see 2.3), but only an application that is aware of the command search order described in 3.9.1.1 or of performance characteristics can discern differences between the behavior of such a function or built-in and that of a true executable file. _2._2._2._1_5_7 : The vertical tab character. 2.2.2.158 white space: A sequence of one or more characters that belong to the space character class as defined via the LC_CTYPE category in the current locale. In the POSIX Locale, white space consists of one or more s (s and s), s, s, s, and s. 2.2.2.159 working directory [current working directory]: A directory, associated with a process, that is used in pathname resolution for pathnames that do not begin with a slash. 2.2.2.160 write: To output characters to a file, such as standard output or standard error. Unless otherwise stated, standard output is the default output destination for all uses of the term _w_r_i_t_e. BEGIN_RATIONALE 2.2.2.161 General Terms Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Many of the terms originated in POSIX.1 {8} and are duplicated in this standard to meet editorial requirements. In some cases, there is supplementary text that presents additional information concerning POSIX.2 aspects of the concept. This standard uses the term _c_h_a_r_a_c_t_e_r to mean a sequence of one or more bytes representing a single graphic symbol, as defined in POSIX.1 {8}. 1 The deviation in the exact text of the C Standard {7} definition for _b_y_t_e 1 meets the intent of the C Standard {7} Rationale and the developers of 1 POSIX.1 {8}, but clears up the ambiguity raised by the term _b_a_s_i_c 1 _e_x_e_c_u_t_i_o_n _c_h_a_r_a_c_t_e_r _s_e_t, which is not defined in POSIX.1 {8}. It is 1 expected that a future version of POSIX.1 {8} will align with the text 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 52 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 used here. The octet-minimum requirement is merely a reflection of the 1 {CHAR_BIT} value in POSIX.1 {8} and the C Standard {7}. 1 The POSIX.1 {8} term _f_i_l_e _m_o_d_e is a superset of the POSIX.2 _f_i_l_e _m_o_d_e _b_i_t_s. POSIX.1 {8} defines the file mode as the entire _m_o_d_e__t object (which includes the file type in historically the upper four bits, the sticky bit on most implementations, and potentially other nonstandardized attributes), while POSIX.2 file mode bits include only the eleven defined bits. The terms _c_o_m_m_a_n_d and _u_t_i_l_i_t_y are related but have distinct meanings. Command is defined as ``a directive to a shell to perform a specific task.'' The directive can be in the form of a single utility name (for example, ls), or the directive can take the form of a compound command (for example, ls | grep name | pr). A utility is a program that is callable by name from a shell. Issuing only the utility's name to a shell is the equivalent of a one-word command. A utility may be invoked as a separate program that executes in a different process than the command language interpreter, or may be implemented as a part of the command language interpreter. For example, the echo command (the directive to perform a specific task) may be implemented such that the echo utility (the logic that performs the task of echoing) is in a separate program; and therefore, is executed in a process that is different than the command language interpreter. Conversely, the logic that performs the echo utility could be built into the command language interpreter; and therefore, execute in the same process as the command language interpreter. The terms _t_o_o_l and _a_p_p_l_i_c_a_t_i_o_n can be thought of as being synonymous with _u_t_i_l_i_t_y from the perspective of the operating system kernel. Tools, applications, and utilities have historically run, typically, in processes above the kernel level. Tools and utilities have been historically a part of the operating system nonkernel code, and performed system related functions such as listing directory contents, checking file systems, repairing file systems, or extracting system status information. Applications have not generally been a part of the operating system, and perform nonsystem related functions such as word processing, architectural design, mechanical design, workstation publishing, or financial analysis. Utilities have most frequently been provided by the operating system vendor, applications by third party software vendors or by the users themselves. Nevertheless, the standard does not differentiate between tools, utilities, and applications when it comes to receiving services from the system, a shell, or the standard utilities. (For example, the xargs utility invokes another utility; it would be of fairly limited usefulness if the users couldn't run their own applications in place of the standard utilities.) Utilities are not applications in the sense that they are not themselves subjects to the restrictions of this standard or any other standard--there is no Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 53 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX requirement for grep, stty, or any of the utilities defined here to be any of the classes of Conforming POSIX.2 Applications. The term _t_e_x_t _f_i_l_e does not prevent the inclusion of control or other nonprintable characters (other than NUL). Therefore, standard utilities that list text files as inputs or outputs are either able to process the special characters gracefully or they explicitly describe their limitations within their individual subclauses. The definition of _t_e_x_t _f_i_l_e has caused a good deal of controversy. The only difference between text and binary here is that text files have lines of (less than {LINE_MAX}) bytes, with no NUL characters, each terminated by a character. The definition allows a file with a single , but not a totally empty file, to be called a text file. If a file ends with an incomplete line it is not strictly a text file by this definition. A related point is that the character referred to in this standard is not some generic line separator, but a single character; files created on systems where they use multiple characters for ends of lines are not portable to all POSIX systems without some translation process unspecified by this standard. The term _h_a_r_d _l_i_n_k is historically-derived. In systems without extensions to ln, it is a synonym for _l_i_n_k. The concept of a _s_y_m_b_o_l_i_c _l_i_n_k originated with BSD systems and the term _h_a_r_d is used to differentiate between the two types of links. There are some terms used that are undefined in POSIX.2, POSIX.1 {8}, or the C Standard {7}. The working group believes that these terms have a ``common usage,'' and that a definition in POSIX.2 would not be appropriate. Terms in this category include, but are not limited to, the following: _a_p_p_l_i_c_a_t_i_o_n, _c_h_a_r_a_c_t_e_r _s_e_t, _l_o_g_i_n _s_e_s_s_i_o_n, _u_s_e_r. Good sources for general terms of this type are the _I_S_O/_A_F_N_O_R _D_i_c_t_i_o_n_a_r_y _o_f _C_o_m_p_u_t_e_r _S_c_i_e_n_c_e {B12} and _I_E_E_E _D_i_c_t_i_o_n_a_r_y {B18}. The term _f_i_l_e _n_a_m_e was defined in previous drafts to be a synonym for _p_a_t_h_n_a_m_e. It was removed in the face of objections that it was too close to _f_i_l_e_n_a_m_e, which means something different (a pathname component). The general solution to this has been to use the term _f_i_l_e in parameter names, rather than _f_i_l_e__n_a_m_e, and to make more liberal use of the correct term, _p_a_t_h_n_a_m_e; an alternate solution has been to replace _f_i_l_e _n_a_m_e with _t_h_e _n_a_m_e _o_f _t_h_e _f_i_l_e. Many character names are included in this subclause. Because of historical usage, some of these names are a bit different than the ones used in international standards for character sets, such as ISO/IEC 646 {1}. It was felt that many more UNIX system people than character set lawyers would be reading and reviewing the standard, so the former group was the one accommodated. On the other hand, the precise definitions of , , and _w_h_i_t_e _s_p_a_c_e have replaced common usage (where they have been used virtually interchangeably), as the standard attempts to Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 54 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 balance readability against precision. In earlier drafts, the names for the character pairs ( ), [ ], and { } were referred to as ``opening'' and ``closing'' parentheses, brackets, and braces. These were changed to the current ``left'' and right.'' When the characters are used to express natural language, the terms ``open'' and ``close'' imply text direction more strongly than ``left'' and ``right.'' By POSIX.2 definition, the character will always be mapped to the glyph '(' regardless of the locale. But when reading right-to-left, the opening punctuation of a parenthesized text segment would be ')'. The and forms are the correct ones because the punctuation appears on the left and right, respectively, of the parenthesized text regardless of the direction one might be reading the text. The character and the ERASE special character defined in POSIX.1 {8} should not be confused. The use of the character and the ERASE special character defined in the POSIX.1 {8} _t_e_r_m_i_o_s clause on special characters (7.1.1.9) are distinct even though the ERASE special character may be set to . In most one-byte character sets, such as ASCII, the concepts of column positions is identical to character positions and to bytes. Therefore, it has been historically acceptable for some implementations to describe line folding or tab stops or table column alignment in terms of bytes or character positions. Other character sets pose complications, as they can have internal representations longer than one octet and they can have displayable characters that have different widths on the terminal screen or printer. In this standard the term _c_o_l_u_m_n _p_o_s_i_t_i_o_n_s has been defined to mean character--not byte--positions in input files (such as ``column position 7 of the FORTRAN input''). Output files describe the column position in terms of the display width of the narrowest printable character in the character set, adjusted to fit the characteristics of the output device. It is very possible that _n column positions will not be able to hold _n characters in some character sets, unless all of those characters are of the narrowest width. It is assumed that the implementation is aware of the width of the various characters, deriving this information from the value of LC_CTYPE, and thus can determine how many column positions to allot for each character in those utilities where it is important. This information is not available to the portable application writer because POSIX.2 provides no interface specification to retrieve such information. The term _c_o_l_u_m_n _p_o_s_i_t_i_o_n was used instead of the more natural _c_o_l_u_m_n as the latter is frequently used in the standard in the different contexts of columns of figures, columns of table values, etc. Wherever confusion might result, these latter types of columns are referred to as _t_e_x_t _c_o_l_u_m_n_s. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 55 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The definition of _b_i_n_a_r_y _f_i_l_e was removed, as the term is not used in the standard. The ISO/IEC 646 {1} character set standard permits substitution of national currency symbols for the character $ in the ``reference character set'' (which is the same as ASCII). This standard permits the substitution only of the actual characters shown in ISO/IEC 646 {1}: currency sign for the dollar sign and pound sign for the number sign. This document uses the latter names and their symbols, but it is valid for an implementation to accept, for instance, the pound sign () as a comment character in the shell, if that is what the locale's character set uses instead of the number sign (#). Other variation of national currency symbols are not allowed, per the request of the WG15 POSIX working group. The term _s_t_r_e_a_m is not related to System V's STREAMS communications facility; it is derived from historical UNIX system usage and has been made official by the C Standard {7}. The POSIX.2 standard makes no differentiation between C's _t_e_x_t _s_t_r_e_a_m and _b_i_n_a_r_y _s_t_r_e_a_m. The formula used in the POSIX.1 {8} definition of _s_e_c_o_n_d_s _s_i_n_c_e _t_h_e _E_p_o_c_h 1 is not perfect in all cases. See the related rationale in POSIX.1 {8}. 1 END_RATIONALE 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 56 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.2.3 Abbreviations For the purposes of this standard, the following abbreviations apply: 2.2.3.1 C Standard: ISO/IEC 9899: ..., _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g _s_y_s_t_e_m_s- -_P_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e_s--_C {7}. 2.2.3.2 ERE: An Extended Regular Expression, as defined in 2.8.4. 2.2.3.3 LC_*: An abbreviation used to represent all of the environment variables named in 2.6 whose names begin with the characters ``LC_''. 2.2.3.4 POSIX.1: ISO/IEC 9945-1: 1990: _I_n_f_o_r_m_a_t_i_o_n _t_e_c_h_n_o_l_o_g_y-- _P_o_r_t_a_b_l_e _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _I_n_t_e_r_f_a_c_e (_P_O_S_I_X)--_P_a_r_t _1: _S_y_s_t_e_m _A_p_p_l_i_c_a_t_i_o_n _P_r_o_g_r_a_m _I_n_t_e_r_f_a_c_e (_A_P_I) [_C _L_a_n_g_u_a_g_e] {8}. 2.2.3.5 POSIX.2: This standard. 2.2.3.6 RE [BRE]: A Basic Regular Expression, as defined in 2.8.3. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.2 Definitions 57 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.3 Built-in Utilities Any of the standard utilities may be implemented as _r_e_g_u_l_a_r _b_u_i_l_t-_i_n utilities within the command language interpreter. This is usually done to increase the performance of frequently-used utilities or to achieve functionality that would be more difficult in a separate environment. The utilities named in Table 2-2 are frequently provided in built-in form. All of the utilities named in the table have special properties in terms of command search order within the shell, as described in 3.9.1.1. Table 2-2 - Regular Built-in Utilities __________________________________________________________________________________________________________________________________________________ cd false kill true wait command getopts read umask __________________________________________________________________________________________________________________________________________________ However, all of the standard utilities, including the regular built-ins in the table, but not the special built-ins described in 3.14, shall be implemented in a manner so that they can be accessed via the POSIX.1 {8} _e_x_e_c family of functions (if the underlying operating system provides the services of such a family to application programs) and can be invoked directly by those standard utilities that require it (env, find, nohup, xargs). Since versions shall be provided for all utilities except for those listed previously, an application running on a system that conforms to both POSIX.1 {8} and Section 7 of this standard can use the _e_x_e_c family of functions, in addition to the shell command interface in 7.1 [such as the _s_y_s_t_e_m() and _p_o_p_e_n() functions in the C binding] defined by this standard, to execute any of these utilities. BEGIN_RATIONALE 2.3.1 Built-in Utilities Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) In earlier drafts, the table of built-ins implied two things to a conforming application: these may be built-ins and these need not be executable. The second implication has now been removed and all utilities can be _e_x_e_c-ed. There is no requirement that these be actually built into the shell itself, but many shells will want to do so because 3.9.1.1 requires that they be found prior to the PATH search. The shell could satisfy its requirements by keeping a list of the names and directly accessing the file-system versions regardless of PATH. Providing all of the required functionality for those such as cd or read Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 58 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 would be more difficult. There were originally three justifications for allowing the omission of _e_x_e_c-able versions: (1) This would require wasting space in the file system, at the expense of very small systems. However, it has been pointed out that all nine in the table can be provided with nine links to a single-line shell script: $0 "$@" (2) There is no sense in requiring invocation of utilities like cd because they have no value outside the shell environment or cannot be useful in a child process. However, counter-examples always seemed to be available for even the strangest cases: find . -type d -exec cd {} ; -exec foo {} ; (which invokes foo on accessible directories) ps ... | sed ... | xargs kill find . -exec true ; -a ... (where true is used for temporary debugging) (3) It is confusing to have something such as kill that can easily be in the file system in the base standard, but requires built- in status for the UPE (for the % job control job ID notation). It was decided that it was more appropriate to describe the required functionality (rather than the implementation) to the system implementors and let them decide how to satisfy it. On the other hand, there were objections raised during balloting that any distinction like this between utilities was not useful to applications and that the cost to correct it was small. These arguments were ultimately the most effective. There were varying reasons for including utilities in the table of built-ins: cd, getopts, read, umask, wait The functionality of these utilities is performed more simply within the context of the current process. An example can be taken from the usage of the cd utility. The purpose of the utility is to change the working directory for subsequent operations. The actions of cd affect the process in which cd is executed and all subsequent child processes of that process. Based on the POSIX.1 {8} process model, changes in the process Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.3 Built-in Utilities 59 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX environment of a child process have no effect on the parent process. If the cd utility were executed from a child process, the working directory change would be effective only in the child process. Child processes initiated subsequent to the child process that executed the cd utility would not have a changed working directory relative to the parent process. command This utility was placed in the table primarily to protect scripts that are concerned about their PATH being manipulated. The ``secure'' shell script example in 4.12.10 would not be possible if a PATH change retrieved an alien version of command. (An alternative would have been to implement getconf as a built-in, but it was felt that it carried too many changing configuration strings to require in the shell.) kill Since common extensions to kill (including the planned User Portability Extension) provide optional job control functionality using shell notation (%1, %2, etc.), some implementations would find it extremely difficult to provide this outside the shell. true, false These are in the table as a courtesy to programmers who wish to use the ``while true'' shell construct without protecting true from PATH searches. (It is acknowledged that ``while :'' also works, but the idiom with true is historically pervasive.) All utilities, including those in the table, are accessible via the functions in 7.1.1 or 7.1.2 [such as _s_y_s_t_e_m() or _p_o_p_e_n()]. There are situations where the return functionality of _s_y_s_t_e_m() and _p_o_p_e_n() is not desirable. Applications that require the exit status of the invoked utility will not be able to use _s_y_s_t_e_m() or _p_o_p_e_n(), since the exit status returned is that of the command language interpreter rather than that of the invoked utility. The alternative for such applications is the use of the _e_x_e_c family. (The text concerning conformance to POSIX.1 {8} was included because where _e_x_e_c is not provided in the underlying system, there is no way to require that utilities be _e_x_e_c- able). END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 60 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.4 Character Set Conforming implementations shall support one or more coded character sets. Each supported coded character set shall include the _p_o_r_t_a_b_l_e _c_h_a_r_a_c_t_e_r _s_e_t specified in Table 2-3. The table defines the characters in the portable character set and the corresponding symbolic character names used to identify each character in a character set description file. The names are chosen to correspond closely with character names defined in other international standards. The table contains more than one symbolic character name for characters whose traditional name differs from the chosen name. This standard places only the following requirements on the encoded values of the characters in the portable character set: (1) If the encoded values associated with each member of the portable character set are not invariant across all locales supported by the implementation, the results achieved by an application accessing those locales are unspecified. (2) The encoded values associated with the digits '0' to '9' shall be such that the value of each character after '0' shall be one greater than the value of the previous character. (3) A null character, NUL, which has all bits set to zero, shall be in the set of characters. Conforming implementations shall support certain character and character set attributes, as defined in 2.5.1. 2.4.1 Character Set Description File Implementations shall provide a character set description file for at least one coded character set supported by the implementation. These files are referred to elsewhere in this standard as _c_h_a_r_m_a_p files. It is implementation defined whether or not users or applications can provide additional character set description files. If such a capability is supported, the system documentation shall describe the rules for the creation of such files. Each character set description file shall define characteristics for the coded character set and the encoding for the characters specified in Table 2-3, and may define encoding for additional characters supported by the implementation. Other information about the coded character set may also be in the file. Coded character set character values shall be defined using symbolic character names followed by character encoding values. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.4 Character Set 61 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 2-3 - Character Set and Symbolic Names __________________________________________________________________________________________________________________________________________________ Symbolic Symbolic Symbolic Name Glyph Name Glyph Name Glyph _____________________________________________________________________________ : ^ ; ^ < _ = _ > ` ? a @ b A c B d ! C e " D f # E g $ F h % G i & H j ' I k ( J l ) K m * L n + M o , N

p - O q -

;;;;;;;;;; # lower ;;;;;;;;;;;;;\ ;;

;;;;;;;;;; # digit ;;;;;;;;; # space ;;;;; # cntrl ;;;;;\ ;;\ ;;;;;;;;\ ;;;;;;;;\ ;;;;;;;;\ ; # punct ;;;\ ;;;;\ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 76 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 ;;;\ ;;;;;\ ;;;;\ ;; ;;;\ ;;;\ ;;; # xdigit ;;;;;;;;;\ ;;;;;;;;;;;; # blank ; # toupper (,);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);\ (

,

);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);(,) # tolower (,);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);\ (

,

);(,);(,);(,);(,);\ (,);(,);(,);(,);(,);(,) END LC_CTYPE __________________________________________________________________________________________________________________________________________________ The LC_CTYPE category shall define character classification, case conversion, and other character attributes. In addition, a series of characters can be represented by three adjacent periods representing an 1 ellipsis symbol (``...''). The ellipsis specification shall be 1 interpreted as meaning that all values between the values preceding and 1 following it represent valid characters. The ellipsis specification only 1 shall be valid within a single encoded character set. An ellipsis shall be interpreted as including in the list all characters with an encoded value higher than the encoded value of the character preceding the ellipsis and lower than the encoded value of the character following the ellipsis. _E_x_a_m_p_l_e: \x30;...;\x39; includes in the character class all characters with encoded values between the endpoints. The following keywords shall be recognized. In the descriptions, the term ``automatically included'' means that it shall not be an error to either include the referenced characters or to omit them; the implementation shall provide them if missing and accept them silently if present. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 77 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. upper Define characters to be classified as uppercase letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. If this keyword is 2 not specified, the uppercase letters A through Z, as 2 defined in Table 2-3 (see 2.4.1), shall automatically 2 belong to this class, with implementation-defined 2 character values. 2 lower Define characters to be classified as lowercase letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. If this keyword is 2 not specified, the lowercase letters a through z, as 2 defined in Table 2-3 (see 2.4.1), shall automatically 2 belong to this class, with implementation-defined 2 character values. 2 alpha Define characters to be classified as letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. In addition, characters classified as either upper or lower shall automatically belong to this class. digit Define the characters to be classified as numeric digits. 2 Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 shall be 2 specified, and in ascending sequence by numerical value. 2 If this keyword is not specified, the digits 0 through 9, 2 as defined in Table 2-3 (see 2.4.1), shall automatically 2 belong to this class, with implementation-defined 2 character values. 2 space Define characters to be classified as white-space characters. No character specified for the keywords upper, lower, alpha, digit, graph, or xdigit shall be 1 specified. If this keyword is not specified, the 2 characters , , , , , and , as defined in 2 Table 2-3 (see 2.4.1), shall automatically belong to this 2 class, with implementation-defined character values. Any 2 characters included in the class blank shall be 1 automatically included. 1 cntrl Define characters to be classified as control characters. No character specified for the keywords upper, lower, alpha, digit, punct, graph, print, or xdigit shall be 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 78 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 specified. 1 punct Define characters to be classified as punctuation characters. No character specified for the keywords upper, lower, alpha, digit, cntrl, xdigit, or as the character shall be specified. graph Define characters to be classified as printable characters, not including the character. If this keyword is not specified, characters specified for the keywords upper, lower, alpha, digit, xdigit, and punct shall belong to this character class. No character specified for the keyword cntrl shall be specified. print Define characters to be classified as printable characters, including the character. If this keyword is not provided, characters specified for the keywords upper, lower, alpha, digit, xdigit, punct, and the character shall belong to this character class. No character specified for the keyword cntrl shall be specified. xdigit Define the characters to be classified as hexadecimal digits. Only the characters defined for the class digit 2 shall be specified, in ascending sequence by numerical 2 value, followed by one or more sets of six characters 2 representing the hexadecimal digits 10 through 15, with 2 each set in ascending order (for example A, B, C, D, E, 2 F, a, b, c, d, e, f). If this keyword is not specified, 2 the digits 0 through 9, the uppercase letters A through 2 F, and the lowercase letters a through f, as defined in 2 Table 2-3 (see 2.4.1), shall automatically belong to this 2 class, with implementation-defined character values. 2 blank Define characters to be classified as characters. If this keyword is unspecified, the characters and shall belong to this character class. toupper Define the mapping of lowercase letters to uppercase letters. The operand shall consist of character pairs, separated by semicolons. The characters in each character pair shall be separated by a comma and the pair enclosed by parentheses. The first character in each pair shall be the lowercase letter, the second the corresponding uppercase letter. Only characters specified for the keywords lower and upper shall be specified. If this keyword is not specified, the 2 lowercase letters a through z, and their corresponding 2 uppercase letters A through Z, as defined in Table 2-3 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 79 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (see 2.4.1), shall automatically be included, with 2 implementation-defined character values. 2 tolower Define the mapping of uppercase letters to lowercase letters. The operand shall consist of character pairs, separated by semicolons. The characters in each character pair are separated by a comma and the pair enclosed by parentheses. The first character in each pair shall be the uppercase letter, the second the corresponding lowercase letter. Only characters specified for the keywords lower and upper shall be specified. The tolower keyword is optional. If specified, the uppercase letters A through Z, as defined in Table 2-3, and their corresponding lowercase letter, shall be specified. If this keyword is not specified, the mapping shall be the reverse mapping of the one specified for toupper. Table 2-6 shows the allowed character class combinations. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 80 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 2-6 - Valid Character Class Combinations __________________________________________________________________________________________________________________________________________________ _____________________________________________________________________________ | In |_________________________C_a_n__A_l_s_o__B_e_l_o_n_g__T_o__________________________| |Class | upper lower alpha digit space cntrl punct graph print xdigit blank | _|________|____________________________________________________________________| |upper | - - M X X X X D D - X | |lower | - - M X X X X D D - X | |alpha | - - - X X X X D D - X | |digit | X X X - X X X D D - X | |space | X X X X - - * * * X - 2| |cntrl | X X X X - - X X X X - 2| |punct | X X X X - X - D D X - | |graph | - - - - - X - - - - - | |print | - - - - - X - - - - - | |xdigit | - - - - X X X D D - X | _||b_l_a_n_k____||___X______X______X______X______M______-______*______*______*______X_______-___2_|| NOTES: (1) Explanation of codes: M Always D Default; belongs to class if not specified - Permitted X Mutually exclusive * See note (2) (2) The character, which is part of the space and blank classes, cannot belong to punct or graph, but automatically shall belong to the print class. Other space or blank characters can be classified as punct, graph, and/or print. __________________________________________________________________________________________________________________________________________________ BEGIN_RATIONALE 2.5.2.1.1 LC_CTYPE Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The LC_CTYPE category primarily is used to define the encoding- independent aspects of a character set, such as character classification. In addition, certain encoding-dependent characteristics are also defined for an application via the LC_CTYPE category. POSIX.2 does not mandate Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 81 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX that the encoding used in the locale is the same as the one used by the application, because an implementation may decide that it is advantageous to define locales in a system-wide encoding rather than having multiple, logically identical locales in different encodings, and to convert from the application encoding to the system-wide encoding on usage. Other implementations could require encoding-dependent locales. In either case, the LC_CTYPE attributes that are directly dependent on the encoding, such as mb_cur_max and the display width of characters, are not user-specifiable in a locale source, and are consequently not defined as keywords. As the LC_CTYPE character classes are based on the C Standard {7} character-class definition, the category does not support multicharacter elements. For instance, the German character is traditionally classified as a lowercase letter. There is no corresponding uppercase letter; in proper capitalization of German text the will be replaced by SS; i.e., by two characters. This kind of conversion is outside the scope of the toupper and tolower keywords. Where POSIX.2 specifies that only certain characters can be specified, as 1 for the keywords digit and xdigit, the specified characters must be from 1 the portable character set, as shown. As an example, only the Arabic 1 digits 0 through 9 are acceptable as digits. 1 The character classes digit, xdigit, lower, upper, and space have a set 2 of automatically included characters. These only need to be specified if 2 the character values (i.e., encoding) differs from the implementation 2 default values. 2 The definition of character class digit requires that only ten 2 characters--the ones defining digits--can be specified; alternate digits 2 (e.g., Hindi or Kanji) cannot be specified here. However, the encoding 2 may vary if an implementation supports more than one encoding. 2 The definition of character class xdigit requires that the characters 2 included in character class digit are included here also, and allows for 2 different symbols for the hexadecimal digits 10 through 15. 2 END_RATIONALE 2 2.5.2.2 LC_COLLATE A collation sequence definition shall define the relative order between collating elements (characters and multicharacter collating elements) in the locale. This order is expressed in terms of collation values; i.e., by assigning each element one or more collation values (also known as collation weights). This does not imply that implementations shall assign such values, but that ordering of strings using the resultant Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 82 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 collation definition in the locale shall behave as if such assignment is done and used in the collation process. The collation sequence definition shall be used by regular expressions, pattern matching, and sorting. The following capabilities are provided: (1) Multicharacter collating elements. Specification of multicharacter collating elements (i.e., sequences of two or more characters to be collated as an entity). (2) User-defined ordering of collating elements. Each collating element shall be assigned a collation value defining its order in the character (or basic) collation sequence. This ordering is used by regular expressions and pattern matching and, unless collation weights are explicitly specified, also as the collation weight to be used in sorting. (3) Multiple weights and equivalence classes. Collating elements can be assigned one or more (up to the limit {COLL_WEIGHTS_MAX}) collating weights for use in sorting. The first weight is hereafter referred to as the primary weight. (4) One-to-Many mapping. A single character is mapped into a string of collating elements. (5) Many-to-Many substitution. A string of one or more characters is substituted by another string (or an empty string, i.e., the character or characters shall be ignored for collation purposes). (6) Equivalence class definition. Two or more collating elements have the same collation value (primary weight). (7) Ordering by weights. When two strings are compared to determine 2 their relative order, the two strings are first broken up into a 2 series of collating elements, and each successive pair of 2 elements are compared according to the relative primary weights 2 for the elements. If equal, and more than one weight has been 2 assigned, then the pairs of collating elements are recompared 2 according to the relative subsequent weights, until either a 2 pair of collating elements compare unequal or the weights are 2 exhausted. 2 The following keywords shall be recognized in a collation sequence definition. They are described in detail in the following subclauses. copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 83 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX collating-element Define a collating-element symbol representing a 1 multicharacter collating element. This keyword 1 is optional. collating-symbol Define a collating symbol for use in collation 1 order statements. This keyword is optional. 1 2 order_start Define collation rules. This statement is followed by one or more collation order statements, assigning character collation values and collation weights to collating elements. order_end Specify the end of the collation-order 1 statements. 1 Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_COLLATE # This is the POSIX Locale definition for the LC_COLLATE category. # The order is the same as in the ASCII code set. order_start forward Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 84 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 _________________________________________________________________________ Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale (_c_o_n_t_i_n_u_e_d) _________________________________________________________________________ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 85 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX

_________________________________________________________________________ 2.5.2.2.1 collating-element Keyword In addition to the collating elements in the character set, the collating-element keyword shall be used to define multicharacter collating elements. The syntax is "collating-element %s from %s\n", <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, <_s_t_r_i_n_g> The <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> operand shall be a symbolic name, enclosed between 1 angle brackets (< and >), and shall not duplicate any symbolic name in the current charmap file (if any), or any other symbolic name defined in this collation definition. The string operand shall be a string of two or more characters that shall collate as an entity. A <_c_o_l_l_a_t_i_n_g- 1 _e_l_e_m_e_n_t> defined via this keyword is only recognized with the LC_COLLATE 1 category. _E_x_a_m_p_l_e: collating-element from collating-element from collating-element from ll Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 86 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale (_c_o_n_c_l_u_d_e_d) _________________________________________________________________________

order_end # END LC_COLLATE __________________________________________________________________________________________________________________________________________________ _2._5._2._2._2 collating-symbol _K_e_y_w_o_r_d This keyword shall be used to define symbols for use in collation sequence statements; i.e., between the order_start and the order_end keywords. The syntax is Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 87 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX "collating-symbol %s\n", <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> The <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> shall be a symbolic name, enclosed between angle 1 brackets (< and >), and shall not duplicate any symbolic name in the current charmap file (if any), or any other symbolic name defined in this collation definition. A <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> defined via this keyword is only recognized with the LC_COLLATE category. _E_x_a_m_p_l_e: collating-symbol collating-symbol 2 _2._5._2._2._3 order_start _K_e_y_w_o_r_d The order_start keyword shall precede collation order entries and also defines the number of weights for this collation sequence definition and other collation rules. The syntax of the order_start keyword is: "order_start %s;%s;...;%s\n", <_s_o_r_t-_r_u_l_e_s>, <_s_o_r_t-_r_u_l_e_s> ... The operands to the order_start keyword are optional. If present, the operands define rules to be applied when strings are compared. The number of operands define how many weights each element is assigned; if no operands are present, one forward operand is assumed. If present, the first operand defines rules to be applied when comparing strings using the first (primary) weight; the second when comparing strings using the second weight, and so on. Operands shall be separated by semicolons (;). Each operand shall consist of one or more collation directives, separated by commas (,). If the number or operands exceeds the {COLL_WEIGHTS_MAX} limit, the utility shall issue a warning message. The following directives shall be supported: forward Specifies that comparison operations for the weight level shall proceed from start of string towards the end of string. backward Specifies that comparison operations for the weight level shall proceed from end of string towards the beginning of string. 2 position Specifies that comparison operations for the weight level will consider the relative position of non- 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 88 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 IGNOREd elements in the strings. The string 2 containing a non-IGNOREd element after the fewest 2 IGNOREd collating elements from the start of the 2 compare shall collate first. If both strings 2 contain a non-IGNOREd character in the same 2 relative position, the collating values assigned to 2 the elements shall determine the ordering. In case 2 of equality, subsequent non-IGNOREd characters 2 shall be considered in the same manner. 2 The directives forward and backward are mutually exclusive. _E_x_a_m_p_l_e: order_start forward;backward 2 If no operands are specified, a single forward operand shall be assumed. 1 2.5.2.2.4 Collation Order The order_start keyword shall be followed by collating element entries. The syntax for the collating element entries is "%s %s;%s;...;%s\n", <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>, <_w_e_i_g_h_t>, <_w_e_i_g_h_t>, ... Each _c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t shall consist of either a character (in any of the 1 forms defined in 2.5.2), a <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>, a <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, an 1 ellipsis, or the special symbol UNDEFINED. The order in which collating 1 elements are specified determines the character collation sequence, such 1 that each collating element shall compare less than the elements 1 following it. The NUL character shall compare lower than any other 1 character. 1 A <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> shall be used to specify multicharacter collating 1 elements, and indicates that the character sequence specified via the 1 <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> is to be collated as a unit and in the relative order 1 specified by its place. 1 A <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> shall be used to define a position in the relative 1 order for use in weights. 1 The ellipsis symbol (``...'') specifies that a sequence of characters 1 shall collate according to their encoded character values. It shall be 1 interpreted as indicating that all characters with a coded character set value higher than the value of the character in the preceding line, and lower than the coded character set value for the character in the following line, in the current coded character set, shall be placed in the character collation order between the previous and the following character in ascending order according to their coded character set Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 89 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX values. An initial ellipsis shall be interpreted as if the preceding line specified the NUL character, and a trailing ellipsis as if the following line specified the highest coded character set value in the current coded character set. An ellipsis shall be treated as invalid if the preceding or following lines do not specify characters in the current coded character set. The use of the ellipsis symbol ties the definition 1 to a specific coded character set and may preclude the definition from 1 being portable between implementations. 1 The symbol UNDEFINED shall be interpreted as including all coded character set values not specified explicitly or via the ellipsis symbol. Such characters shall be inserted in the character collation order at the point indicated by the symbol, and in ascending order according to their 1 coded character set values. If no UNDEFINED symbol is specified, and the 1 current coded character set contains characters not specified in this clause, the utility shall issue a warning message and place such characters at the end of the character collation order. The optional operands for each collation-element shall be used to define the primary, secondary, or subsequent weights for the collating element. The first operand specifies the relative primary weight, the second the relative secondary weight, and so on. Two or more collation-elements can be assigned the same weight; they belong to the same _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s if 1 they have the same primary weight. Collation shall behave as if, for 1 each weight level, IGNOREd elements are removed. Then each successive 2 pair of elements shall be compared according to the relative weights for 1 the elements. If the two strings compare equal, the process shall be 1 repeated for the next weight level, up to the limit {COLL_WEIGHTS_MAX}. 1 Weights shall be expressed as characters (in any of the forms specified 1 in 2.5.2), <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>s, <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>s, an ellipsis, or the 1 special symbol IGNORE. A single character, a <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, or a 1 <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> shall represent the relative order in the character 1 collating sequence of the character or symbol, rather than the character 1 or characters themselves. 1 One-to-many mapping is indicated by specifying two or more concatenated 1 characters or symbolic names. Thus, if the character ``'' is 1 given the string as a weight, comparisons shall be performed as if 1 all occurrences of the character are replaced by . If it 1 is desirable to define and as an equivalence class, then a 1 collating-element must be defined for the string ``ss'', as in the 1 example below. 1 All characters specified via an ellipsis shall by default be assigned 1 unique weights, equal to the relative order of characters. Characters 1 specified via an explicit or implicit UNDEFINED special symbol shall by 1 default be assigned the same primary weight (i.e., belong to the same 1 equivalence class). An ellipsis symbol as a weight shall be interpreted 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 90 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 to mean that each character in the sequence shall have unique weights, 1 equal to the relative order of their character in the character collation 1 sequence. Secondary and subsequent weights have unique values. The use 1 of the ellipsis as a weight shall be treated as an error if the collating 1 element is neither an ellipsis nor the special symbol UNDEFINED. 1 The special keyword IGNORE as a weight shall indicate that when strings are compared using the weights at the level where IGNORE is specified, the collating element shall be ignored; i.e., as if the string did not contain the collating element. In regular expressions and pattern matching, all characters that are IGNOREd in their primary weight form an equivalence class. An empty operand shall be interpreted as the collating-element itself. For example, the order statement ; is equal to An ellipsis can be used as an operand if the collating-element was an ellipsis, and shall be interpreted as the value of each character defined by the ellipsis. The collation order as defined in this clause defines the interpretation 1 of bracket expressions in regular expressions (see 2.8.3.2). 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 91 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E_x_a_m_p_l_e: order_start forward;backward UNDEFINED IGNORE;IGNORE ; ... ;... ; ; ; ; ; ; ; ; ; 2 ; ... ;... order_end This example is interpreted as follows: (1) The UNDEFINED means that all characters not specified in this definition (explicitly or via the ellipsis) shall be ignored for collation purposes; for regular expression purposes they are ordered first. (2) All characters between and shall have the same primary equivalence class and individual secondary weights based on their ordinal encoded values. (3) All characters based on the upper- or lowercase character a belong to the same primary equivalence class. (4) The multicharacter collating element is represented by the collating symbol and belongs to the same primary equivalence class as the multicharacter collating element . (5) Note that it is not possible to use the collating element 1 as a weight and expect it to be expanded to the string ``ss''. 1 When used as a weight, any collating-element represents the 1 relative order assigned to it in the character collation 1 sequence, not the string from which it was derived (compare with 1 ). 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 92 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.5.2.2.5 order_end Keyword The collating order entries shall be terminated with an order_end keyword. BEGIN_RATIONALE 2.5.2.2.6 LC_COLLATE Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The LC_COLLATE category governs the collation order in the locale, and thus the processing of the C Standard {7} _s_t_r_x_f_r_m() and _s_t_r_c_o_l_l() functions, as well as a number of POSIX.2 utilities. The rules governing collation depends to some extent on the use. At least five different levels of increasingly complex collation rules can be distinguished: (1) Byte/machine code order. This is the historical collation order in the UNIX system and many proprietary operating systems. Collation is here done character by character, without any regard to context. The primary virtue is that it usually is quite fast, and also completely deterministic; it works well when the native machine collation sequence matches the user expectations. (2) Character order. On this level, collation is also done character by character, without regard to context. The order between characters is, however, not determined by the code values, but on the user's expectations of the ``correct'' order between characters. In addition, such a (simple) collation order can specify that certain characters collate equal (e.g., upper- and lowercase letters). (3) String ordering. On this level, entire strings are compared based on relatively straightforward rules. At this level, several ``passes'' may be required to determine the order between two strings. Characters may be ignored in some passes, but not in others; the strings may be compared in different directions; and simple string substitutions may be made before strings are compared. This level is best described as ``dictionary'' ordering; it is based on the spelling, not the pronunciation, or meaning, of the words. (4) Text search ordering. This is a further refinement of the previous level, best described as ``telephone book ordering''; 1 some common homonyms (words spelled differently but with same 1 pronunciation) are collated together; numbers are collated as if spelled with words, and so on. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 93 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (5) Semantic level ordering. Words and strings are collated based on their meaning; entire words (such as ``the'') are eliminated, the ordering is not deterministic. This usually requires special software, and is highly dependent on the intended use. While the historical collation order formally is at level 1, for the English language it corresponds roughly to elements at level 2. The user expects to see the output from the ls utility sorted very much as as it would be in a dictionary. While telephone book ordering would be an optimal goal for standard collation, this was ruled out as the order would be language dependent. Furthermore, a requirement was that the order must be determined solely from the text string and the collation rules; no external information (e.g., ``pronunciation dictionaries'') could be required. As a result, the goal for the collation support is at level 3. This also matches the requirements for the proposed Canadian collation order, as well as other, known collation requirements for alphabetic scripts. It specifically rules out collation based on pronunciation rules, or based on semantic analysis of the text. The syntax for the LC_COLLATE category source is the result of a cooperative effort between representatives for many countries and organizations working with international issues, such as UniForum, X/Open, and ISO, and it meets the requirements for level 3, and has been verified to produce the correct result with examples based on French, Canadian, and Danish collation order, as well as meeting the requirements in the X/Open Portability Guide, Issue 3. {B31}. Because it supports multicharacter collating elements, it is also capable of supporting collation in code sets where a character is expressed using nonspacing characters followed by the base character (such as ISO 6937 {B6}). The directives that can be specified in an operand to the order_start 2 keyword are based on the requirements specified in several proposed 2 standards and in customary use. The following is a rephrasing of rules 2 defined for ``lexical ordering in English and French'' by the Canadian 2 Standards Association (text is brackets is rephrased): 2 (1) Once special characters ([punctuation]) have been removed from 2 original strings, the ordering is determined by scanning forward 2 (left to right) [disregarding case and diacriticals]. 2 (2) In case of equivalence, special characters are once again 2 removed from original strings and the ordering is determined 2 scanning backward (starting from the rightmost character of the 2 string and back), character by character, [disregarding case but 2 considering diacriticals]. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 94 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 (3) In case of repeated equivalence, special characters are removed 2 again from original strings and the ordering is determined 2 scanning forward, character by character, [considering both case 2 and diacriticals]. 2 (4) If there is still an ordering equivalence after rules (1) 2 through (3) have been applied, then only special characters and 2 the position they occupy in the string are considered to 2 determine ordering. The string that has a special character in 2 the lowest position comes first. If two strings have a special 2 character in the same position, the character [with the lowest 2 collation value] comes first. In case of equality, the other 2 special characters are considered until there is a difference or 2 all special characters have been exhausted. 2 It is estimated that the standard covers the requirements for all European languages, and no particular problems are anticipated with Slavic or Middle East character sets. The Far East (particularly Japanese/Chinese) collations are often based on contextual information and pronunciation rules (the same ideogram can have different meanings and different pronunciations). Such collation, in general, falls outside the desired goal of the standard. There are, however, several other collation rules (stroke/radical, or ``most common pronunciation'') which can be supported with the mechanism described here. Previous drafts contained a substitute statement, which performed a 2 regular expression style replacement before string compares. It has been 2 withdrawn based on balloter objections that it was not required for the 2 types of ordering POSIX.2 is aimed at. 2 The character (and collating element) order is defined by the order in 2 which characters and elements are specified between the order_start and 2 order_end keywords. This character order is used in range expressions in 2 regular expressions (see 2.8). Weights assigned to the characters and 2 elements defines the collation sequence; in the absence of weights, the 2 character order is also the collation sequence. 2 The position keyword was introduced to provide the capability to 1 consider, in a compare, the relative position of non-IGNORE_d characters. 1 As an example, consider the two strings ``o-ring'' and ``or-ing''. 1 Assuming the hyphen is IGNORE_d on the first pass, the two strings will 1 compare equal, and the position of the hyphen is immaterial. On second 1 pass, all characters except the hyphen are IGNORE_d, and in the normal 1 case the two strings would again compare equal. By taking position into 1 account, the first collates before the second. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 95 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 1 2.5.2.3 LC_MONETARY Table 2-8 - LC_MONETARY Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_MONETARY # This is the POSIX Locale definition for # the LC_MONETARY category. # int_curr_symbol "" currency_symbol "" mon_decimal_point "" mon_thousands_sep "" mon_grouping "" positive_sign "" negative_sign "" int_frac_digits -1 p_cs_precedes -1 p_sep_by_space -1 n_cs_precedes -1 n_sep_by_space -1 p_sign_posn -1 n_sign_posn -1 # END LC_MONETARY __________________________________________________________________________________________________________________________________________________ The LC_MONETARY category shall define the rules and symbols that shall be used to format monetary numeric information. The operands are strings. For some keywords, the strings can contain only integers. Keywords that are not provided, string values set to the empty string (""), or integer 1 keywords set to -1, shall be used to indicate that the value is 1 unspecified. The following keywords shall be recognized: copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. int_curr_symbol The international currency symbol. The operand shall be a four-character string, with the first three characters containing the alphabetic international currency symbol in accordance with those specified in ISO 4217 {3} (_C_o_d_e_s _f_o_r _t_h_e _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _c_u_r_r_e_n_c_i_e_s _a_n_d _f_u_n_d_s). The fourth character shall be the character used to separate the international currency symbol from the monetary quantity. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 96 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 currency_symbol The string that shall be used as the local currency symbol. mon_decimal_point The operand is a string containing the symbol 2 that shall be used as the decimal delimiter in 2 monetary formatted quantities. In contexts 2 where other standards limit the 2 mon_decimal_point to a single byte, the result 2 of specifying a multibyte operand is 2 unspecified. 2 mon_thousands_sep The operand is a string containing the symbol 2 that shall be used as a separator for groups of 2 digits to the left of the decimal delimiter in 2 formatted monetary quantities. In contexts 2 where other standards limit the 2 mon_thousands_sep to a single byte, the result 2 of specifying a multibyte operand is 2 unspecified. 2 mon_grouping Define the size of each group of digits in formatted monetary quantities. The operand is a sequence of integers separated by semicolons. Each integer specifies the number of digits in each group, with the initial integer defining the size of the group immediately preceding the decimal delimiter, and the following integers defining the preceding groups. If the last 2 integer is not -1, then the size of the previous 2 group (if any) shall be repeatedly used for the 2 remainder of the digits. If the last integer is 2 -1, then no further grouping shall be performed. 2 positive_sign A string that shall be used to indicate a nonnegative-valued formatted monetary quantity. negative_sign A string that shall be used to indicate a negative-valued formatted monetary quantity. int_frac_digits An integer representing the number of fractional digits (those to the right of the decimal delimiter) to be written in a formatted monetary quantity using int_curr_symbol. frac_digits An integer representing the number of fractional digits (those to the right of the decimal delimiter) to be written in a formatted monetary quantity using currency_symbol. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 97 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX p_cs_precedes An integer set to 1 if the currency_symbol or int_curr_symbol precedes the value for a nonnegative formatted monetary quantity, and set to 0 if the symbol succeeds the value. p_sep_by_space An integer set to 0 if no space separates the currency_symbol or int_curr_symbol from the value for a nonnegative formatted monetary quantity, set to 1 if a space separates the symbol from the value, and set to 2 if a space separates the symbol and the sign string, if adjacent. n_cs_precedes An integer set to 1 if the currency_symbol or int_curr_symbol precedes the value for a negative formatted monetary quantity, and set to 0 if the symbol succeeds the value. n_sep_by_space An integer set to 0 if no space separates the currency_symbol or int_curr_symbol from the value for a negative formatted monetary quantity, set to 1 if a space separates the symbol from the value, and set to 2 if a space separates the symbol and the sign string, if adjacent. p_sign_posn An integer set to a value indicating the positioning of the positive_sign for a nonnegative formatted monetary quantity. The following integer values shall be recognized: 0 Parentheses enclose the quantity and the currency_symbol or int_curr_symbol. 1 The sign string precedes the quantity and the currency_symbol or int_curr_symbol. 2 The sign string succeeds the quantity and the currency_symbol or int_curr_symbol. 3 The sign string immediately precedes the currency_symbol or int_curr_symbol. 4 The sign string immediately succeeds the currency_symbol or int_curr_symbol. n_sign_posn An integer set to a value indicating the positioning of the negative_sign for a negative 1 formatted monetary quantity. The following Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 98 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 integer values shall be recognized: 0 Parentheses enclose the quantity and the currency_symbol or int_curr_symbol. 1 The sign string precedes the quantity and the currency_symbol or int_curr_symbol. 2 The sign string succeeds the quantity and the currency_symbol or int_curr_symbol. 3 The sign string immediately precedes the currency_symbol or int_curr_symbol. 4 The sign string immediately succeeds the currency_symbol or int_curr_symbol. BEGIN_RATIONALE 2.5.2.3.1 LC_MONETARY Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The currency symbol does not appear in LC_MONETARY because it is not defined in the C Standard's {7} C locale. The C Standard {7} limits the size of decimal points and thousands 2 delimiters to single-byte values. In locales based on multibyte coded 2 character sets this cannot be enforced, obviously; this standard does not 2 prohibit such characters, but makes the behavior unspecified [in the text 2 ``In contexts where other standards ...'']. 2 The grouping specification is based on, but not identical to, the 2 C Standard {7}. The ``-1'' signals that no further grouping shall be 2 performed, the equivalent of {CHAR_MAX} in the C Standard {7}). 2 The locale definition is an extension of the C Standard {7} _l_o_c_a_l_e_c_o_n_v() specification. In particular, rules on how currency_symbol is treated are extended to also cover int_curr_symbol, and p_set_by_space and n_sep_by_space have been augmented with the value 2, which places a space between the sign and the symbol (if they are adjacent; otherwise it should be treated as a 0). The following table shows the result of various combinations: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 99 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX p_sep_by_space 2 1 0 p_cs_precedes = 1 p_sign_posn = 0 ($1.25) ($ 1.25) ($1.25) p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+ p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25 p_cs_precedes = 0 p_sign_posn = 0 (1.25 $) (1.25 $) (1.25$) p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$ p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+ p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$ p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+ The following is an example of the interpretation of the mon_grouping keyword. Assuming that the value to be formatted is 123456789 and the mon_thousands_sep is ', then the following table shows the result. The 1 third column shows the equivalent C Standard {7} string that would be 1 used to accommodate this grouping. It is the responsibility of the 1 utility to perform mappings of the formats in this clause to those used 1 by language bindings such as the C Standard {7}. 1 mon_grouping Formatted Value C Standard {7} String 1 ____________ _______________ _____________________ 1 3;-1 123456'789 "\3\177" 2 3 123'456'789 "\3" 2 3;2;-1 1234'56'789 "\3\2\177" 2 3;2 12'34'56'789 "\3\2" 2 -1 123456789 "177" 2 In these examples, the octal value of {CHAR_MAX} is 177. 2 END_RATIONALE 2.5.2.4 LC_NUMERIC The LC_NUMERIC category shall define the rules and symbols that shall be used to format nonmonetary numeric information. The operands are strings. For some keywords, the strings only can contain integers. Keywords that are not provided, string values set to the empty string 1 (""), or integer keywords set to -1, shall be used to indicate that the 1 value is unspecified. The following keywords shall be recognized: copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 100 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 decimal_point The operand is a string containing the symbol that 2 shall be used as the decimal delimiter in numeric, 2 nonmonetary formatted quantities. This keyword 2 cannot be omitted and cannot be set to the empty 2 string. In contexts where other standards limit 2 the decimal_point to a single byte, the result of 2 specifying a multibyte operand is unspecified. 2 thousands_sep The operand is a string containing the symbol that 2 shall be used as a separator for groups of digits 2 to the left of the decimal delimiter in numeric, 2 nonmonetary formatted monetary quantities. In 2 contexts where other standards limit the 2 thousands_sep to a single byte, the result of 2 specifying a multibyte operand is unspecified. 2 grouping Define the size of each group of digits in formatted nonmonetary quantities. The operand is a sequence of integers separated by semicolons. Each integer specifies the number of digits in each group, with the initial integer defining the size of the group immediately preceding the decimal delimiter, and the following integers defining the preceding groups. If the last integer is not -1, 2 then the size of the previous group (if any) shall 2 be repeatedly used for the remainder of the digits. 2 If the last integer is -1, then no further grouping 2 shall be performed. 2 Table 2-9 - LC_NUMERIC Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_NUMERIC # This is the POSIX Locale definition for # the LC_NUMERIC category. # decimal_point "" 2 thousands_sep "" grouping 0 # END LC_NUMERIC __________________________________________________________________________________________________________________________________________________ BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 101 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.5.2.4.1 LC_NUMERIC Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) See the rationale for LC_MONETARY (2.5.2.3.1) for a description of the 1 behavior of grouping. 1 END_RATIONALE 1 2.5.2.5 LC_TIME The LC_TIME category shall define the interpretation of the field descriptors supported by the date utility (see 4.15). Table 2-10 - LC_TIME Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_TIME # This is the POSIX Locale definition for # the LC_TIME category. # # Abbreviated weekday names (%a) abday "";"";"";"";\ "";"";"" # # Full weekday names (%A) day "";"";\ "";"";\ "";"";\ "" # # Abbreviated month names (%b) abmon "";"";"";\ "

";"";"";\ "";"";"

";\ "";"";"" # # Full month names (%B) mon "";"";\ "";"

";\ "";"";\ "";"";\ "

";"";\ "";"" # # Equivalent of AM/PM (%p) "AM";"PM" am_pm "";"

" # # Appropriate date and time representation (%c) # "%a %b %e %H:%M:%S %Y" 1 d_t_fmt "\1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 102 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 \ " # # Appropriate date representation (%x) "%m/%d/%y" d_fmt "" # # Appropriate time representation (%X) "%H:%M:%S" t_fmt "" # # Appropriate 12-hour time representation (%r) "%I:%M:%S %p" t_fmt_ampm "\

" # END LC_TIME __________________________________________________________________________________________________________________________________________________ The following mandatory keywords shall be recognized: copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. abday Define the abbreviated weekday names, corresponding to the %a field descriptor. The operand shall consist of seven semicolon-separated strings. The first string shall be the abbreviated name of the first day of the week (Sunday), the second the abbreviated name of the second day, and so on. day Define the full weekday names, corresponding to the %A field descriptor. The operand shall consist of seven semicolon-separated strings. The first string shall be the full name of the first day of the week (Sunday), the second the full name of the second day, and so on. abmon Define the abbreviated month names, corresponding to the %b field descriptor. The operand shall consist of twelve semicolon-separated strings. The first string shall be the abbreviated name of the first month of the year (January), the second the abbreviated name of the second month, and so on. mon Define the full month names, corresponding to the %B field descriptor. The operand shall consist of twelve semicolon-separated strings. The first string shall be the full name of the first month of the year (January), the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 103 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX second the full name of the second month, and so on. d_t_fmt Define the appropriate date and time representation, corresponding to the %c field descriptor. The operand shall consist of a string, and can contain any combination of characters and field descriptors. In addition, the string can contain escape sequences defined in Table 2-15. 1 d_fmt Define the appropriate date representation, corresponding to the %x field descriptor. The operand shall consist of a string, and can contain any combination of characters and field descriptors. In addition, the string can contain escape sequences defined in Table 2-15. 1 t_fmt Define the appropriate time representation, corresponding to the %X field descriptor. The operand shall consist of a string, and can contain any combination of characters and field descriptors. In addition, the string can contain escape sequences defined in Table 2-15. 1 am_pm Define the appropriate representation of the _a_n_t_e _m_e_r_i_d_i_e_m and _p_o_s_t _m_e_r_i_d_i_e_m strings, corresponding to the %p field descriptor. The operand shall consist of two strings, separated by a semicolon. The first string shall represent the _a_n_t_e _m_e_r_i_d_i_e_m designation, the last string the _p_o_s_t _m_e_r_i_d_i_e_m designation. t_fmt_ampm Define the appropriate time representation in the 12-hour clock format with am_pm, corresponding to the %r field descriptor. The operand shall consist of a string and can contain any combination of characters and field descriptors. If the string is empty, the 12-hour format is not supported in the locale. It is implementation defined whether the following optional keywords shall be recognized. If they are not supported, but present in a localedef source, they shall be ignored. era Shall be used to define alternate Eras, corresponding to the %E field descriptor modifier. The format of the operand is unspecified, but shall support the definition of the %EC and %Ey field descriptors, and may also define the era_year format (%EY). era_year Shall be used to define the format of the year in alternate Era format, corresponding to the %EY field descriptor. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 104 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 era_d_fmt Shall be used to define the format of the date in alternate Era notation, corresponding to the %Ex field descriptor. alt_digits Shall be used to define alternate symbols for digits, corresponding to the %O field descriptor modifier. The operand shall consist of semicolon-separated strings. The first string shall be the alternate symbol corresponding with zero, the second string the symbol corresponding with one, and so on. Up to 100 alternate symbol strings can be specified. The %O modifier indicates that the string corresponding to the value specified via the field descriptor shall be used instead of the value. BEGIN_RATIONALE 2.5.2.5.1 LC_TIME Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Although certain of the field descriptors in the POSIX Locale (such as the name of the month) are shown with initial capital letters, this need not be the case in other locales. Programs using these fields may need to adjust the capitalization if the output is going to be used at the beginning of a sentence. The LC_TIME descriptions of abday, daya, and abmon imply a Gregorian 1 style calendar (7-day weeks, 12-month years, leap years, etc.). 1 Formatting time strings for other types of calendars is outside the scope 1 of this standard. 1 As specified under the date command, the field descriptors corresponding to the optional keywords consist of a modifier followed by a traditional field descriptor (for instance %Ex). If the optional keywords are not supported by the implementation or are unspecified for the current locale, these field descriptors shall be treated as the traditional field descriptor. For instance, assume the following keywords: alt_digits "0th";"1st";"2nd";"3rd";"4th";"5th";\ 1 "6th";"7th";"8th";"9th";"10th" 1 d_fmt "The %Od day of %B in %Y" 1 On 7/4/1776, the %x field descriptor would result in ``The 4th day of 1 July in 1776,'' while 7/14/1789 would come out as ``The 14 day of July in 1789.'' It can be noted that the above example is for illustrative purposes only; the %O modifier is primarily intended to provide for Kanji or Hindi digits in date formats. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 105 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX While it is clear that an alternate year format is required, there is no consensus on the format or the requirements. As a result, while these keywords are reserved, the details are left unspecified. It is expected that National Standards Bodies will provide specifications. END_RATIONALE 2.5.2.6 LC_MESSAGES The LC_MESSAGES category shall define the format and values for affirmative and negative responses. The operands shall be strings or extended regular expressions; see 2.8.4. The following keywords shall be recognized: copy Specify the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be specified. yesexpr The operand shall consist of an extended regular expression that describes the acceptable affirmative response to a question expecting an affirmative or negative response. noexpr The operand shall consist of an extended regular expression that describes the acceptable negative response to a question expecting an affirmative or negative response. Table 2-11 - LC_MESSAGES Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_MESSAGES # This is the POSIX Locale definition for # the LC_MESSAGES category. # yesexpr "" # noexpr "" END LC_MESSAGES __________________________________________________________________________________________________________________________________________________ BEGIN_RATIONALE 2.5.2.6.1 LC_MESSAGES Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The LC_MESSAGES category is described in 2.6 as affecting the language used by utilities for their output. The mechanism used by the implementation to accomplish this, other than the responses shown here in the locale definition file, is not specified by this version of this standard. The POSIX.1 working group is developing an interface that Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 106 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 would allow applications (and, presumably some of the standard utilities) to access messages from various message catalogs, tailored to a user's LC_MESSAGES value. END_RATIONALE 2.5.3 Locale Definition Grammar 1 The grammar and lexical conventions in this subclause shall together 1 describe the syntax for the locale definition source. The general 1 conventions for this style of grammar are described in 2.1.2. Any 1 discrepancies found between this grammar and other descriptions in this 1 clause shall be resolved in favor of this grammar. 1 2.5.3.1 Locale Lexical Conventions 1 The lexical conventions for the locale definition grammar are described 1 in this subclause. 1 The following tokens shall be processed (in addition to those string 1 constants shown in the grammar): 1 LOC_NAME A string of characters representing the name of a 1 locale. 1 CHAR Any single character. 1 NUMBER A decimal number, represented by one or more decimal 2 digits. 2 COLLSYMBOL A symbolic name, enclosed between angle brackets. The 1 string shall not duplicate any charmap symbol defined 1 in the current charmap (if any), or a COLLELEMENT 1 symbol. 1 COLLELEMENT A symbolic name, enclosed between angle brackets, which 1 shall not duplicate either any charmap symbol or a 1 CHARSYMBOL symbol. 1 CHARSYMBOL A symbolic name, enclosed between angle brackets, from 1 the current charmap (if any). 1 OCTAL_CHAR One or more octal representations of the encoding of 1 each byte in a single character. The octal 1 representation consists of an escape_char (normally a 1 backslash) followed by two or more octal digits. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 107 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX HEX_CHAR One or more hexadecimal representations of the encoding 1 of each byte in a single character. The hexadecimal 1 representation consists of an escape_char followed by 1 the constant 'x' and two or more hexadecimal digits. 1 DECIMAL_CHAR One or more decimal representations of the encoding of 1 each byte in a single character. The decimal 1 representation consists of an escape_char and followed 1 by a 'd' and two or more decimal digits. 1 ELLIPSIS The string ``...''. 1 2 EXTENDED_REG_EXP 1 An extended regular expression as defined in the 1 grammar in 2.8.5.2. 1 2 EOL The line termination character . 1 2.5.3.2 Locale Grammar 1 This subclause presents the grammar for the locale definition. 1 %token LOC_NAME 1 %token CHAR 1 %token NUMBER 2 %token COLLSYMBOL COLLELEMENT 1 %token CHARSYMBOL OCTAL_CHAR HEX_CHAR DECIMAL_CHAR 1 %token ELLIPSIS 1 %token EXTENDED_REG_EXP 2 %token EOL 1 %start locale_definition 1 %% 1 locale_definition : global_statements locale_categories 2 | locale_categories 2 ; 1 global_statements : global_statements symbol_redefine 2 | symbol_redefine 2 ; 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 108 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 symbol_redefine : '#escape_char' CHAR EOL 1 | '#comment_char' CHAR EOL 1 ; 1 locale_categories : locale_categories locale_category 2 | locale_category 2 ; 1 locale_category : lc_ctype | lc_collate | lc_messages 1 | lc_monetary | lc_numeric | lc_time 1 ; 1 /* The following grammar rules are common to all categories */ 1 char_list : char_list char_symbol 2 | char_symbol 2 ; 1 char_symbol : CHAR | CHARSYMBOL 1 | OCTAL_CHAR | HEX_CHAR | DECIMAL_CHAR 1 ; 1 locale_name : LOC_NAME 1 | '"' LOC_NAME '"' 1 ; 1 /* The following is the LC_CTYPE category grammar */ 1 lc_ctype : ctype_hdr ctype_keywords ctype_tlr 2 | ctype_hdr 'copy' locale_name EOL ctype_tlr 2 ; 2 ctype_hdr : 'LC_CTYPE' EOL 2 ; 2 ctype_keywords : ctype_keywords ctype_keyword 2 | ctype_keyword 2 ; 1 ctype_keyword : charclass_keyword charclass_list EOL 1 | charconv_keyword charconv_list EOL 1 ; 1 charclass_keyword : 'upper' | 'lower' | 'alpha' | 'digit' 1 | 'alnum' | 'xdigit' | 'space' | 'print' 1 | 'graph' | 'blank' | 'cntrl' 1 ; 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 109 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX charclass_list : charclass_list ';' char_symbol 2 | charclass_list ';' ELLIPSIS ';' char_symbol 1 | char_symbol 2 ; 1 charconv_keyword : 'toupper' 1 | 'tolower' 1 ; 1 charconv_list : charconv_list ';' charconv_entry 2 | charconv_entry 2 ; 1 charconv_entry : '(' char_symbol ',' char_symbol ')' 1 ; 1 ctype_tlr : 'END' 'LC_CTYPE' EOL 2 ; 1 /* The following is the LC_COLLATE category grammar */ 1 lc_collate : collate_hdr collate_keywords collate_tlr 2 | collate_hdr 'copy' locale_name EOL collate_tlr 2 ; 2 collate_hdr : 'LC_COLLATE' EOL 2 ; 2 collate_keywords : order_statements 2 | opt_statements order_statements 2 ; 1 opt_statements : opt_statements collating_symbols 2 | opt_statements collating_elements 2 | collating_symbols 1 | collating_elements 1 ; 1 collating_symbols : 'collating-symbol' COLLSYMBOL EOL 1 ; 1 collating_elements : 'collating-element' COLLELEMENT 1 'from' '"' char_list '"' EOL 2 ; 1 2 order_statements : order_start collation_order order_end 1 ; 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 110 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 order_start : 'order_start' EOL 1 | 'order_start' order_opts EOL 1 ; 1 order_opts : order_opts ';' order_opt 2 | order_opt 2 ; 1 order_opt : order_opt ',' opt_word 2 | opt_word 2 ; 1 opt_word : 'forward' | 'backward' | 'position' 2 ; 1 collation_order : collation_order collation_entry 2 | collation_entry 2 ; 1 collation_entry : COLLSYMBOL EOL 1 | collation_element weight_list EOL 1 | collation_element EOL 2 ; 1 collation_element : char_symbol 1 | COLLELEMENT 1 | ELLIPSIS 1 | 'UNDEFINED' 1 ; 1 weight_list : weight_list ';' weight_symbol 2 | weight_list ';' 2 | weight_symbol 2 ; 1 weight_symbol : char_symbol 2 | COLLSYMBOL 1 | '"' char_list '"' 1 | ELLIPSIS 1 | 'IGNORE' 1 ; 1 order_end : 'order_end' EOL 1 ; 1 collate_tlr : 'END' 'LC_COLLATE' EOL 2 ; 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 111 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX /* The following is the LC_MESSAGES category grammar */ 1 lc_messages : messages_hdr messages_keywords messages_tlr 2 | messages_hdr 'copy' locale_name EOL messages_tlr 2 ; 2 messages_hdr : 'LC_MESSAGES' EOL 2 ; 2 messages_keywords : messages_keywords messages_keyword 2 | messages_keyword 2 ; 1 messages_keyword : 'yesexpr' '"' EXTENDED_REG_EXP '"' EOL 2 | 'noexpr' '"' EXTENDED_REG_EXP '"' EOL 2 ; 2 messages_tlr : 'END' 'LC_MESSAGES' EOL 2 ; 1 /* The following is the LC_MONETARY category grammar */ 1 lc_monetary : monetary_hdr monetary_keywords monetary_tlr2 | monetary_hdr 'copy' locale_name EOL monetary_tlr2 ; 2 monetary_hdr : 'LC_MONETARY' EOL 2 ; 2 monetary_keywords : monetary_keywords monetary_keyword 2 | monetary_keyword 2 ; 1 monetary_keyword : mon_keyword_string mon_string EOL 1 | mon_keyword_char NUMBER EOL 2 | mon_keyword_char '-1' EOL 2 | mon_keyword_grouping mon_group_list EOL 1 ; 1 mon_keyword_string : 'int_curr_symbol' | 'currency_symbol' 1 | 'mon_decimal_point' | 'mon_thousands_sep' 1 | 'positive_sign' | 'negative_sign' 1 ; 1 mon_string : '"' char_list '"' 1 | '""' 1 ; 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 112 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 mon_keyword_char : 'int_frac_digits' | 'frac_digits' 1 | 'p_cs_precedes' | 'p_sep_by_space' 1 | 'n_cs_precedes' | 'n_sep_by_space' 1 | 'p_sign_posn' | 'n_sign_posn' 1 ; 1 2 mon_keyword_grouping : 'mon_grouping' 1 ; 1 mon_group_list : NUMBER 2 | mon_group_list ';' NUMBER 2 ; 2 monetary_tlr : 'END' 'LC_MONETARY' EOL 2 ; 2 /* The following is the LC_NUMERIC category grammar */ 2 lc_numeric : numeric_hdr numeric_keywords numeric_tlr 2 | numeric_hdr 'copy' locale_name EOL numeric_tlr 2 ; 2 numeric_hdr : 'LC_NUMERIC' EOL 2 ; 2 numeric_keywords : numeric_keywords numeric_keyword 2 | numeric_keyword 2 ; 1 numeric_keyword : num_keyword_string num_string EOL 1 | num_keyword_grouping num_group_list EOL 1 ; 1 num_keyword_string : 'decimal_point' 1 | 'thousands_sep' 1 ; 1 num_string : '"' char_list '"' 1 | '""' 1 ; 1 num_keyword_grouping : 'num_grouping' 1 ; 1 num_group_list : NUMBER 2 | num_group_list ';' NUMBER 2 ; 1 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 113 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX numeric_tlr : 'END' 'LC_NUMERIC' EOL 2 ; 1 /* The following is the LC_TIME category grammar */ 1 lc_time : time_hdr time_keywords time_tlr 2 | time_hdr 'copy' locale_name EOL time_tlr 2 ; 1 time_hdr : 'LC_TIME' EOL 2 ; 1 time_keywords : time_keywords time_keyword 2 | time_keyword 2 ; 1 time_keyword : time_keyword_name time_list EOL 2 | time_keyword_fmt time_string EOL 1 | time_keyword_opt time_list EOL 1 ; 1 time_keyword_name : 'abday' | 'day' | 'abmon' | 'mon' 2 ; 1 time_keyword_fmt : 'd_t_fmt' | 'd_fmt' | 't_fmt' | 'am_pm' | 't_fmt_ampm'1 ; 1 time_keyword_opt : 'era' | 'era_year' | 'era_d_fmt' | 'alt_digits' 1 ; 1 time_list : time_list ';' time_string 2 | time_string 2 ; 1 time_string : '"' char_list '"' 1 ; 1 time_tlr : 'END' 'LC_TIME' EOL 2 ; 1 BEGIN_RATIONALE 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 114 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.5.4 Locale Definition Example. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The following is an example of a locale definition file that could be used as input to the localedef utility. It assumes that the utility is executed with the -f option, naming a _c_h_a_r_m_a_p file with (at least) the following content: CHARMAP \x20 \x24 \101 \141 \346 \365 \300 1 \366 \142 \103 \143 \347 \x64 \110 \150 \xb7 \x73 \x7a END CHARMAP It should not be taken as complete or to represent any actual locale, but only to illustrate the syntax. A further set of examples is offered as part of Annex F. # LC_CTYPE lower ;;;;;...; upper A;B;C;C,;...;Z space \x20;\x09;\x0a;\x0b;\x0c;\x0d 1 blank \040;\011 toupper (,);(b,B);(c,C);(c,,C,);(d,D);(z,Z) END LC_CTYPE # LC_COLLATE # # The following example of collation is based on the proposed 1 # Canadian standard Z243.4.1-1990, "Canadian Alphanumeric 1 # Ordering Standard For Character sets of CSA Z234.4 Standard". 1 # (Other parts of this example locale definition file do not 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 115 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX # purport to relate to Canada, or to any other real culture.) 1 # The proposed standard defines a 4-weight collation, such that # in the first pass, characters are compared without regard to # case or accents; in second pass, backwards compare without # regard to case; in the third pass, forward compare without # regard to diacriticals. In the 3 first passes, non-alphabetic 2 # characters are ignored; in the fourth pass, only special # characters are considered, such that "The string that has a # special character in the lowest position comes first. If two # strings have a special character in the same position, the # collation value of the special character determines ordering. # # Only a subset of the character set is used here; mostly to # illustrate the set-up. # 2 # collating-symbol 2 collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol # Further collating-symbols follow. # # Properly, the standard does not include any multi-character # collating elements; the one below is added for completeness. # collating_element from collating_element from collating_element from # order_start forward;backward;forward;forward,position # # Collating symbols are specified first in the sequence to allocate # basic collation values to them, lower that than of any character. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 116 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 # Further collating symbols are given a basic collating value here. # # Here follows special characters. IGNORE;IGNORE;IGNORE; # Other special characters follow here. # # Here comes the regular characters. ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE # # As an example, the strings "Bach" and "bach" could be encoded (for # compare purposes) as: # "Bach" ;;;;;;\ 2 # ;;;;; 2 # "bach" ;;;;;;\ 2 # ;;;;; 2 # # The two strings are equal in pass 1 and 2, but differ in pass 3. # # Further characters follow. # UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE # order_end # END LC_COLLATE # LC_MONETARY int_curr_symbol "USD " currency_symbol "$" mon_decimal_point "." Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 117 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX mon_grouping 3;0 positive_sign "" negative_sign "-" p_cs_precedes 1 n_sign_posn 0 END LC_MONETARY # LC_NUMERIC copy "US_en.ASCII" 1 END LC_NUMERIC # LC_TIME abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat" # day "Sunday";"Monday";"Tuesday";"Wednesday";\ "Thursday";"Friday";"Saturday" # abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";\ "Jul";"Aug";"Sep";"Oct";"Nov";"Dec" # mon "January";"February";"March";"April";\ "May";"June";"July";"August";"September";\ "October";"November";"December" # d_t_fmt "%a %b %d %T %Z %Y\n" END LC_TIME # LC_MESSAGES yesexpr "^([yY][[:alpha:]]*)|(OK)" 1 # noexpr "^[nN][[:alpha:]]*" 1 END LC_MESSAGES END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 118 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.6 Environment Variables Environment variables defined in this clause affect the operation of multiple utilities and applications. There are other environment variables that are of interest only to specific utilities. Environment variables that apply to a single utility only are defined as part of the utility description. See the Environment Variables subclause of the utility descriptions for information on environment variable usage. The value of an environment variable is a string of characters, as described in 2.7 in POSIX.1 {8}. Environment variable names used by the standard utilities shall consist solely of uppercase letters, digits, and the _ (underscore) from the characters defined in 2.4. The namespace of environment variable names containing lowercase letters shall be reserved for applications. Applications can define any environment variables with names from this namespace without modifying the behavior of the standard utilities. If the following variables are present in the environment during the execution of an application or utility, they are given the meaning described below. They may be put into the environment, or changed, by either the implementation or the user. If they are defined in the utility's environment, the standard utilities assume they have the specified meaning. Conforming applications shall not set these environment variables to have meanings other than as described. See 7.2 and 3.12 for methods of accessing these variables. HOME A pathname of the user's home directory. LANG This variable shall determine the locale category for 1 any category not specifically selected via a variable 1 starting with LC_. LANG and the LC_ variables can be 1 used by applications to determine the language for messages and instructions, collating sequences, date formats, etc. Additional semantics of this variable, if any, are implementation defined. LC_ALL This variable shall override the value of the LANG variable and the value of any of the other variables starting with LC_. LC_COLLATE This variable shall determine the locale category for character collation information within bracketed regular expressions and for sorting. This environment variable determines the behavior of ranges, equivalence classes, and multicharacter collating elements. Additional semantics of this variable, if any, are implementation defined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.6 Environment Variables 119 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_CTYPE This variable shall determine the locale category for character handling functions. This environment variable shall determine the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters), the classification of characters (e.g., alpha, digit, graph), and the behavior of character classes. Additional semantics of this variable, if any, are implementation defined. LC_MESSAGES This variable shall determine the locale category for processing affirmative and negative responses and the language and cultural conventions in which messages should be written. Additional semantics of this variable, if any, are implementation defined. The language and cultural conventions of diagnostic and informative messages whose format is unspecified by this standard should be affected by the setting of LC_MESSAGES. LC_MONETARY This variable shall determine the locale category for monetary-related numeric formatting information. Additional semantics of this variable, if any, are implementation defined. LC_NUMERIC This variable shall determine the locale category for numeric formatting (for example, thousands separator and radix character) information. Additional semantics of this variable, if any, are implementation defined. LC_TIME This variable shall determine the locale category for date and time formatting information. Additional semantics of this variable, if any, are implementation defined. LOGNAME The user's login name. PATH The sequence of path prefixes that certain functions and utilities apply in searching for an executable file known only by a filename. The prefixes shall be separated by a colon (:). When a nonzero-length prefix is applied to this filename, a slash shall be inserted between the prefix and the filename. A zero-length prefix is an obsolescent feature that indicates the current working directory. It appears as two adjacent colons (::), as an initial colon preceding the rest of the list, or as a trailing colon following the rest of the list. A Strictly Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 120 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Conforming POSIX.2 Application shall use an actual pathname (such as '.') to represent the current working directory in PATH. The list shall be searched from beginning to end, applying the filename to each prefix, until an executable file with the specified name and appropriate execution permissions is found. If the pathname being sought contains a slash, the search through the path prefixes shall not be performed. If the pathname begins with a slash, the specified path shall be resolved as described in 2.2.2.104. If PATH is unset or is set to null, the path search is implementation-defined. SHELL A pathname of the user's preferred command language interpreter. If this interpreter does not conform to the shell command language in Section 3, utilities may behave differently than described in this standard. TMPDIR A pathname of a directory made available for programs that need a place to create temporary files. TERM The terminal type for which output is to be prepared. This information is used by utilities and application programs wishing to exploit special capabilities specific to a terminal. The format and allowable values of this environment variable are unspecified. TZ Time-zone information. The format is described in POSIX.1 {8} 8.1.1. The environment variables LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME (LC_*) provide for the support of internationalized applications. The standard utilities shall make use of these environment variables as described in this clause and the individual Environment Variables subclauses for the utilities. If these variables specify locale categories that are not based upon the same underlying code set, the results are unspecified. For utilities used in internationalized applications, if the LC_ALL is not set in the environment or is set to the empty string, and if any of LC_* variables is not set in the environment or is set to the empty string, the operational behavior of the utility for the corresponding locale category shall be determined by the setting of the LANG environment variable. If the LANG environment variable is not set or is set to the empty string, the implementation-defined default locale shall be used. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.6 Environment Variables 121 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX If LANG (or any of the LC_* environment variables) contains the value "C", or the value "POSIX", the POSIX Locale shall be selected and the standard utilities shall behave in accordance with the rules in the 2.5.1 for the associated category. If LANG (or any of the LC_* environment variables) begins with a slash, it shall be interpreted as the pathname of a file that was created in the output format used by the localedef utility; see 4.35.6.3. Referencing such a pathname shall result in that locale being used for the category indicated. If LANG (or any of the LC_* environment variables) contains one of a set of implementation-defined values, the standard utilities shall behave in accordance with the rules in a corresponding implementation-defined locale description for the associated category. If LANG (or any of the LC_* environment variables) contains a value that the implementation does not recognize, the behavior is unspecified. Additional criteria for determining a valid locale name are implementation defined. BEGIN_RATIONALE 2.6.1 Environment Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The standard is worded so that the specified variables _m_a_y be provided to the application. There is no way that the implementation can guarantee that a utility will ever see an environment variable, as a parent process can change the environment for its children. The env -i command in this standard and the POSIX.1 {8} _e_x_e_c family both offer ways to remove any of these variables from the environment. The language about locale implies that any utilities written in Standard C and conforming to POSIX.2 must issue the following call: setlocale(LC_ALL, "") If this were omitted, the C Standard {7} specifies that the C Locale would be used. If any of the environment variables is invalid, it makes sense to default to an implementation-defined, consistent locale environment. It is more confusing for a user to have partial settings occur in case of a mistake. All utilities would then behave in one language/cultural environment. Furthermore, it provides a way of forcing the whole environment to be the implementation-defined default. Disastrous results could occur if a Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 122 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 pipeline of utilities partially use the environment variables in different ways. In this case, it would be appropriate for utilities that use LANG and related variables to exit with an error if any of the variables are invalid. For example, users typing individual commands at a terminal might want date to work if LC_MONETARY is invalid as long as LC_TIME is valid. Since these are conflicting reasonable alternatives, POSIX.2 leaves the results unspecified if the locale environment variables would not produce a complete locale matching the user's specification. The locale settings of individual categories cannot be truly independent and still guarantee correct results. For example, when collating two strings, characters must first be extracted from each string (governed by LC_CTYPE) before being mapped to collating elements (governed by LC_COLLATE) for comparison. That is, if LC_CTYPE is causing parsing according to the rules of a large, multibyte code set (potentially returning 20000 or more distinct character code set values), but LC_COLLATE is set to handle only an 8-bit code set with 256 distinct characters, meaningful results are obviously impossible. The LC_MESSAGES variable affects the language of messages generated by the standard utilities. This standard does not provide a means whereby applications can easily be written to perform similar feats. Future versions of POSIX.1 {8} and POSIX.2 are expected to provide both functions and utilities to accomplish multilanguage messaging (using message catalogs), but such facilities were not ready for standardization at the time the initial versions of the standards were developed. This clause is not a full list of all environment variables, but only those of importance to multiple utilities. Nevertheless, to satisfy some members of the balloting group, here is a list of the other environment variable symbols mentioned in this standard: Variable Utility Variable Utility ________ _______ _________ _______ CDPATH cd MAKEFLAGS make COLUMNS ls OPTARG getopts DEAD mailx OPTIND getopts IFS sh PRINTER lp 1 LPDEST lp PS1 sh MAIL sh PS2 sh MAILRC mailx The description of PATH is similar to that in POSIX.1 {8}, except: - The behavior of a null prefix is marked obsolescent in favor of using a real pathname. This was done at the behest of some members of the balloting group, who apparently felt it offered a more secure environment, where the current directory would not be selected unintentionally. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.6 Environment Variables 123 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - The POSIX.1 {8} _e_x_e_c description requires an implementation-defined path search when PATH is ``not present.'' POSIX.2 spells out that this means ``unset or set to null.'' Many implementations historically have used a default value of /bin and /usr/bin. POSIX.2 does not mandate that this default path be identical to that retrieved from getconf _CS_PATH because it is likely that a transition to POSIX.2 conformance will see the newly-standardized utilities in another directory that needs to be isolated from some historical applications. - The POSIX.1 {8} PATH description is ambiguous about whether an ``executable file'' means one that has the appropriate permissions for the searching process to execute it. One reading would say that a file with any of the execution bits set on would satisfy the search and that an [EACCES] could be returned at that point. This is not the way historical systems work and POSIX.2 has clarified it to mean that the path search will continue until it finds the name with the execute permissions that would allow the process to execute it. (The case of the [ENOEXEC] error is handled in the text of 3.9.1.1.) The terminology ``beginning to end'' is used in PATH to avoid the noninternationalized ``left to right.'' There is no way to have a colon character embedded within a pathname that is part of the PATH variable string. Colon is not a member of the portable filename character set, so this should not be a problem. A portable application can retrieve a default PATH value (that will allow access to all the standard utilities) from the system using the command: getconf _CS_PATH See the rationale with command for an example of using this. The SHELL variable names the user's preferred shell; it is a guide to applications. There is no direct requirement that that shell conform to this standard--that decision should rest with the user. It is the intention of the developers of this standard that alternative shells be permitted, if the user chooses to develop or acquire one. An operating system that builds its shell into the ``kernel'' in such a manner that alternative shells would be impossible does not conform to the spirit of the standard. The following environment variables are not currently used by the standard utilities (although they may be by future UPE utilities). Implementations should reserve the names for the following purposes: EDITOR The name of the user's preferred text file editor. The value of this variable is the name of a utility: either a pathname containing a slash, or a filename to be located Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 124 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 using the PATH environment variable. VISUAL The name of the user's preferred ``visual,'' or full- screen, text file editor. The value of this variable is the name of a utility: either a pathname containing a slash, or a filename to be located using the PATH environment variable. The decision to restrict conforming systems to the use of digits, uppercase letters, and underscores for environment variable names allows applications to use lowercase letters in their environment variable names without conflicting with any conforming system. PROCLANG was added to an earlier draft for internationalized applications, but was removed from the standard because the working group determined that it was not of use. USER was removed from an earlier draft because it was an unreasonable duplication of LOGNAME. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.6 Environment Variables 125 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.7 Required Files The following directories shall exist on conforming systems and shall be used as described. Strictly Conforming POSIX.2 Applications shall not assume the ability to create files in any of these directories. / The root directory. /dev Contains /dev/null and /dev/tty, described below. The following directory shall exist on conforming systems and shall be used as described. /tmp A directory made available for programs that need a place to create temporary files. Applications shall be allowed to create files in this directory, but shall not assume that such files are preserved between invocations of the application. The following files shall exist on conforming systems and shall be both readable and writable. /dev/null An infinite data source/sink. Data written to /dev/null is discarded. Reads from /dev/null always return end-of- file (EOF). /dev/tty In each process, a synonym for the controlling terminal associated with the process group of that process, if any. It is useful for programs or shell procedures that wish to be sure of writing messages to or reading data from the terminal no matter how output has been redirected. BEGIN_RATIONALE 2.7.1 Required Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) A description of the historical /usr/tmp was omitted, removing any concept of differences in emphasis between the / and /usr versions. The descriptions of /bin, /usr/bin, /lib, and /usr/lib were omitted because they are not useful for applications. In an early draft, a distinction was made between _s_y_s_t_e_m and _a_p_p_l_i_c_a_t_i_o_n directory usage, but this was not found to be useful. In Draft 8, /, /dev, /local, /usr/local, and /usr/man were removed. The directories / and /dev were restored in Draft 9. It was pointed out by several balloters that the notion of a hierarchical directory structure is key to other information presented in later sections of the standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 126 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 (Previously, some had argued that special devices and temporary files could conceivably be handled without a directory structure on some implementations. For example, the system could treat the characters ``/tmp'' as a special token that would store files using some non-POSIX file system structure. This notion was rejected by the working group, which requires that all the files in this clause be implemented via POSIX file systems.) The /tmp directory is retained in the standard to accommodate historical applications that assume its availability. Future implementations are encouraged to provide suitable directory names in TMPDIR and future applications are encouraged to use the contents of TMPDIR for creating temporary files. The standard files /dev/null and /dev/tty are required to be both readable and writable to allow applications to have the intended historical access to these files. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.7 Required Files 127 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.8 Regular Expression Notation _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _e_n_t_i_r_e _r_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _c_l_a_u_s_e _a_p_p_e_a_r_s _a_t _t_h_e _e_n_d _o_f _t_h_e _c_l_a_u_s_e. _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s (REs) provide a mechanism to select specific strings from a set of character strings. Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual invariance across locales. The Basic Regular Expression (BRE) notation and construction rules in 2.8.3 shall apply to most utilities supporting regular expressions. Some utilities, instead, support the Extended Regular Expressions (ERE) described in 2.8.4; any exceptions for both cases are noted in the descriptions of the specific utilities using regular expressions. Both BREs and EREs are supported by the Regular Expression Matching interface in 7.3. 2.8.1 Regular Expression Definitions For the purposes of this clause, the following definitions apply. 2.8.1.1 entire regular expression: The concatenated set of one or more BREs or EREs that make up the pattern specified for string selection. 2.8.1.2 matched: A sequence of zero or more characters is said to be matched by a BRE or ERE when the characters in the sequence corresponds to a sequence of characters defined by the pattern. Matching shall be based on the bit pattern used for encoding the 1 character, not on the graphic representation of the character. 1 The search for a matching sequence shall start at the beginning of a string and stop when the first sequence matching the expression is found, where ``first'' is defined to mean ``begins earliest in the string.'' If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest 1 such sequence shall be matched. For example: the BRE bb* matches the 1 second through fourth characters of abbbc, and the ERE 1 (wee|week)(knights|night) matches all ten characters of weeknights. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 128 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Consistent with the whole match being the longest of the leftmost 1 matches, each subpattern, from left to right, shall match the longest 1 possible string. For this purpose, a null string shall be considered to 2 be longer than no match at all. For example, matching the BRE \(.*\).* 2 against abcdef, the subexpression (\1) is abcdef, and matching the BRE 2 \(a*\)* against bc, the subexpression (\1) is the null string. 2 When a multicharacter collating element in a bracket expression (see 1 2.8.3.2) is involved, the longest sequence shall be measured in 1 characters consumed from the string to be matched; i.e., the collating 1 element counts not as one element, but as the number of characters it 1 matches. 1 2.8.1.3 BRE [ERE] matching a single character: A BRE or ERE that matches either a single character or a single collating element. Only a BRE or ERE of this type that includes a bracket expression (see 1 2.8.3.2) can match a collating element. 1 2.8.1.4 BRE [ERE] matching multiple characters: A BRE or ERE that matches a concatenation of single characters or collating elements. Such a BRE or ERE is made up from a _B_R_E (_E_R_E) _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r and _B_R_E (_E_R_E) _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_rs. 1 2.8.2 Regular Expression General Requirements The requirements in this subclause shall apply to both basic and extended regular expressions. The use of regular expressions is generally associated with text processing; i.e., REs (BREs and EREs) operate on text strings; i.e., zero or more characters followed by an end-of-string delimiter (typically NUL). Some utilities employing regular expressions limit the processing to lines; i.e., zero or more characters followed by a . In the regular expression processing described in this standard, the character is regarded as an ordinary character. This standard specifies 1 within the individual descriptions of those standard utilities employing 1 regular expressions whether they permit matching of s; if not 1 stated otherwise, the use of literal s or any escape sequence 1 equivalent produces undefined results. 1 The interfaces specified in this standard do not permit the inclusion of a NUL character in an RE or in the string to be matched. If during the operation of a standard utility a NUL is included in the text designated to be matched, that NUL may designate the end of the text string for the 1 purposes of matching. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 129 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (upper- or lower-) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. The implementation shall support any regular expression that does not exceed 256 bytes in length. This clause uses the term ``invalid'' for certain constructs or 1 conditions. Invalid REs shall cause the utility or function using the RE 1 to generate an error condition. When ``invalid'' is not used, violations 1 of the specified syntax or semantics for REs produce undefined results: 1 this may entail an error, enabling an extended syntax for that RE, or 1 using the construct in error as literal characters to be matched. 1 2.8.3 Basic Regular Expressions 2.8.3.1 BREs Matching a Single Character or Collating Element A BRE ordinary character, a special character preceded by a backslash, or a period shall match a single character. A bracket expression shall match a single character or a single collating element. 2.8.3.1.1 BRE Ordinary Characters An ordinary character is a BRE that matches itself: any character in the supported character set, except for the BRE special characters listed in 2.8.3.1.2. The interpretation of an ordinary character preceded by a backslash (\) is undefined, except for: (1) The characters ), (, {, and }. (2) The digits 1 through 9 (see 2.8.3.3). (3) A character inside a bracket expression. 2.8.3.1.2 BRE Special Characters A _B_R_E _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r has special properties in certain contexts. 1 Outside of those contexts, or when preceded by a backslash, such a 1 character shall be a BRE that matches the special character itself. The 1 BRE special characters and the contexts in which they have their special meaning are: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 130 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 . [ \ The period, left-bracket, and backslash shall be special except when used in a bracket expression (see 2.8.3.2). An expression containing a [ that is not preceded by a backslash and is not part of a bracket expression produces undefined 1 results. 1 * The asterisk is special except when used - In a bracket expression, 1 - As the first character of an entire BRE (after an initial 1 ^, if any), or 1 - As the first character of a subexpression (after an 1 initial ^, if any); see 2.8.3.3. 1 ^ The circumflex shall be special when used 1 - As an anchor (see 2.8.3.5) or, 1 - As the first character of a bracket expression (see 1 2.8.3.2). 1 $ The dollar-sign shall be special when used as an anchor. 1 2.8.3.1.3 Periods in BREs A period (.), when used outside of a bracket expression, is a BRE that shall match any character in the supported character set except NUL. 1 2.8.3.2 RE Bracket Expression A bracket expression (an expression enclosed in square brackets, []) is an RE that matches a single collating element contained in the nonempty 1 set of collating elements represented by the bracket expression. 1 The following rules and definitions apply to bracket expressions: (1) A _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n is either a matching list expression or a nonmatching list expression. It consists of one or more expressions: collating elements, collating symbols, equivalence 1 classes, character classes, or range expressions. Strictly Conforming POSIX.2 Applications shall not use range expressions, but conforming implementations shall support regular expressions containing range expressions. The right-bracket (]) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list [after an initial circumflex (^), if any]. Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].]) or is 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 131 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX the ending right-bracket for a collating symbol, equivalence 1 class, or character class). The special characters . * [ \ (period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression. The character sequences [. [= [: (left-bracket followed by a period, equals-sign, or colon) shall be special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions, and character class expressions. These symbols shall be followed by a valid expression and the matching terminating sequence .], =], or :], as described in the following items. (2) A _m_a_t_c_h_i_n_g _l_i_s_t expression specifies a list that shall match any one of the expressions represented in the list. The first character in the list shall not be the circumflex. For example, [abc] is an RE that matches any of a, b, or c. (3) A _n_o_n_m_a_t_c_h_i_n_g _l_i_s_t expression begins with a circumflex (^), and specifies a list that shall match any character or collating element except for the expressions represented in the list after 1 the leading circumflex. For example, [^abc] is an RE that matches any character or collating element except a, b, or c. 1 The circumflex shall have this special meaning only when it occurs first in the list, immediately following the left- bracket. (4) A _c_o_l_l_a_t_i_n_g _s_y_m_b_o_l is a collating element enclosed within bracket-period ([. .]) delimiters. Collating elements are defined as described in 2.5.2.2.4. Multicharacter collating 1 elements shall be represented as collating symbols when it is necessary to distinguish them from a list of the individual characters that make up the multicharacter collating element. For example, if the string ch is a collating element in the current collation sequence with the associated collating symbol , the expression [[.ch.]] shall be treated as an RE matching the character sequence ch, while [ch] shall be treated as an RE matching c or h. Collating symbols shall be recognized only 1 inside bracket expressions. This implies that the RE [[.ch.]]*c shall match the first through fifth character in the string chchch. If the string is not a collating element in the current collating sequence definition, or if the collating element has 1 no characters associated with it (e.g., see the symbol in 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 132 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 the example collation definition shown in 2.5.2.2.4), the symbol 1 shall be treated as an invalid expression. 1 (5) An _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s _e_x_p_r_e_s_s_i_o_n shall represent the set of collating elements belonging to an equivalence class, as 1 described in 2.5.2.2.4. Only primary equivalence classes shall 1 be recognized. The class shall be expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal ([= =]) delimiters. For example, if a, a`, and a^ belong to the same equivalence class, then [[=a=]b], [[=a`=]b], and [[=a^=]b] shall each be equivalent to [aa`a^b]. If the collating element does not belong to an equivalence class, the equivalence class expression shall be treated as a _c_o_l_l_a_t_i_n_g _s_y_m_b_o_l. (6) A _c_h_a_r_a_c_t_e_r _c_l_a_s_s _e_x_p_r_e_s_s_i_o_n shall represent the set of characters belonging to a character class, as defined in the LC_CTYPE category in the current locale. All character classes specified in the current locale shall be recognized. A character class expression shall be expressed as a character class name enclosed within ``bracket-colon'' ([: :]) delimiters. Strictly conforming POSIX.2 applications shall only use the following character class expressions, which shall be supported on all conforming implementations: [:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:] (7) A _r_a_n_g_e _e_x_p_r_e_s_s_i_o_n represents the set of collating elements that fall between two elements in the current collation sequence, 1 inclusively. It shall be expressed as the starting point and 1 the ending point separated by a hyphen (-). Range expressions shall not be used in Strictly Conforming POSIX.2 Applications because their behavior is dependent on the collating sequence. Range expressions shall be supported by conforming implementations. In the following, all examples assume the collation sequence specified for the POSIX Locale, unless another collation sequence is specifically defined. The starting range point and the ending range point shall be a collating element or collating symbol. An equivalence class 2 expression used as a starting or ending point of a range 2 expression produces unspecified results. The ending range point 2 shall collate equal to or higher than the starting range point; 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 133 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX otherwise the expression shall be treated as invalid. The order used is the order in which the collating elements are specified in the current collation definition. One-to-many mappings (see 2.5.2.2) shall not be performed. For example, assuming that the character eszet (B) is placed in the basic collation sequence after r and s, but before t, and that it maps to the sequence ss for collation purposes, then the expression [r-s] matches only r and s, but the expression [s-t] matches s, B, or t. The interpretation of range expressions where the ending range point also is the starting range point of a subsequent range expression is undefined. The hyphen character shall be treated as itself if it occurs first (after an initial ^, if any) or last in the list, or as an ending range point in a range expression. As examples, the expressions [-ac] and [ac-] are equivalent and match any of the characters a, c, or -; the expressions [^-ac] and [^ac-] are equivalent and match any characters except a, c, or -; the 1 expression [%--] matches any of the characters between % and - 1 inclusive; the expression [--@] matches any of the characters between - and @, inclusive; and the expression [a--@] is invalid, because the letter a follows the symbol - in the POSIX Locale. To use a hyphen as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol. For example: [][.-.]-0], which matches either a right bracket or any character or collating element 1 that collates between hyphen and 0, inclusive. 1 2.8.3.3 BREs Matching Multiple Characters The following rules can be used to construct BREs matching multiple characters from BREs matching a single character: (1) The concatenation of BREs shall match the concatenation of the strings matched by each component of the BRE. 1 (2) A _s_u_b_e_x_p_r_e_s_s_i_o_n can be defined within a BRE by enclosing it between the character pairs \( and \). Such a subexpression shall match whatever it would have matched without the \( and \), except that anchoring within subexpressions is optional 1 behavior; see 2.8.3.5. Subexpressions can be arbitrarily 1 nested. 1 (3) The _b_a_c_k_r_e_f_e_r_e_n_c_e expression \_n shall match the same (possibly 1 empty) string of characters as was matched by a subexpression 1 enclosed between \( and \) preceding the \_n. The character _n shall be a digit from 1 through 9, specifying the _n-th subexpression [the one that begins with the _n-th \( and ends Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 134 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 with the corresponding paired \)]. The expression is invalid if less than _n subexpressions precede the \_n. For example, the expression ^\(.*\)\1$ matches a line consisting of two adjacent appearances of the same string, and the expression \(a\)*\1 2 fails to match a. 2 (4) When a BRE matching a single character, a subexpression, or a 1 backreference is followed by the special character asterisk (*), 1 together with that asterisk it shall match what zero or more 2 consecutive occurrences of the BRE would match. For example, 2 [ab]* and [ab][ab] are equivalent when matching the string ab. 2 (5) When a BRE matching a single character, a subexpression, or a 1 backreference is followed by an _i_n_t_e_r_v_a_l _e_x_p_r_e_s_s_i_o_n of the 1 format \{_m\}, \{_m,\}, or \{_m,_n\}, together with that interval 1 expression it shall match what repeated consecutive occurrences 2 of the BRE would match. The values of _m and _n shall be decimal 2 integers in the range 0 _< _m _< _n _< {RE_DUP_MAX}, where _m 1 specifies the exact or minimum number of occurrences and _n specifies the maximum number of occurrences. The expression \{_m\} shall match exactly _m occurrences of the preceding BRE, \{_m,\} shall match at least _m occurrences, and \{_m,_n\} shall match any number of occurrences between _m and _n, inclusive. 1 For example, in the string abababccccccd the BRE c\{3\} is matched by characters seven through nine, the BRE \(ab\)\{4,\} is not matched at all, and the BRE c\{1,3\}d is matched by characters ten through thirteen. The behavior of multiple adjacent duplication symbols (* and intervals) 1 produces undefined results. 1 2.8.3.4 BRE Precedence 1 The order of precedence shall be as shown in Table 2-12, from high to 1 low. 1 2.8.3.5 BRE Expression Anchoring A BRE can be limited to matching strings that begin or end a line; this 1 is called _a_n_c_h_o_r_i_n_g. The circumflex and dollar-sign special characters 1 shall be considered BRE anchors in the following contexts: 1 (1) A circumflex (^) shall be an anchor when used as the first 1 character of an entire BRE. The implementation may treat 1 circumflex as an anchor when used as the first character of a 1 subexpression. The circumflex shall anchor the expression (or 1 optionally subexpression) to the beginning of a string; only 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 135 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 2-12 - BRE Precedence 1 __________________________________________________________________________________________________________________________________________________ 1 _c_o_l_l_a_t_i_o_n-_r_e_l_a_t_e_d _b_r_a_c_k_e_t _s_y_m_b_o_l_s [= =] [: :] [. .] 1 _e_s_c_a_p_e_d _c_h_a_r_a_c_t_e_r_s \<_s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r> 1 _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n [ ] 1 _s_u_b_e_x_p_r_e_s_s_i_o_n_s/_b_a_c_k_r_e_f_e_r_e_n_c_e_s \( \) \_n 1 _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r-_B_R_E _d_u_p_l_i_c_a_t_i_o_n * \{_m,_n\} 1 _c_o_n_c_a_t_e_n_a_t_i_o_n 1 _a_n_c_h_o_r_i_n_g ^ $ 1 __________________________________________________________________________________________________________________________________________________ sequences starting at the first character of a string shall be 1 matched by the BRE. For example, the BRE ^ab matches ab in the 1 string abcdef, but fails to match in the string cdefab. The BRE 1 \(^ab\) may match the former string. A portable BRE shall 1 escape a leading circumflex in a subexpression to match a 1 literal circumflex. 1 (2) A dollar-sign ($) shall be an anchor when used as the last 1 character of an entire BRE. The implementation may treat a 1 dollar-sign as an anchor when used as the last character of a 1 subexpression. The dollar-sign shall anchor the expression (or 1 optionally subexpression) to the end of the string being 1 matched; the dollar-sign can be said to match the ``end-of- 1 string'' following the last character. 1 (3) A BRE anchored by both ^ and $ shall match only an entire 2 string. For example, the BRE ^abcdef$ matches strings consisting only of abcdef. 1 2.8.4 Extended Regular Expressions The _e_x_t_e_n_d_e_d _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n (ERE) notation and construction rules shall apply to utilities defined as using extended regular expressions; any exceptions to the following rules are noted in the descriptions of the specific utilities using EREs. 2.8.4.1 EREs Matching a Single Character or Collating Element An ERE ordinary character, a special character preceded by a backslash, 1 or a period shall match a single character. A bracket expression shall 1 match a single character or a single collating element. An _E_R_E _m_a_t_c_h_i_n_g 1 _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r enclosed in parentheses shall match the same as the ERE without parentheses would have matched. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 136 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.8.4.1.1 ERE Ordinary Characters An _o_r_d_i_n_a_r_y _c_h_a_r_a_c_t_e_r is an ERE that matches itself. An ordinary character is any character in the supported character set, except for the 2 ERE special characters listed in 2.8.4.1.2. The interpretation of an 2 ordinary character preceded by a backslash (\) is undefined. 2.8.4.1.2 ERE Special Characters An _E_R_E _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r has special properties in certain contexts. 1 Outside of those contexts, or when preceded by a backslash, such a 1 character shall be an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they shall have their special meaning are: . [ \ ( The period, left-bracket, backslash, and left-parenthesis 1 are special except when used in a bracket expression (see 1 2.8.3.2). * + ? { The asterisk, plus-sign, question-mark, and left-brace are special except when used in a bracket expression (see 2.8.3.2). Any of the following uses produce undefined 2 results: 2 - If these characters appear first in an ERE, or immediately following a vertical-line, circumflex, or left-parenthesis. - If a left-brace is not part of a valid interval 1 expression. 1 | The vertical-line is special except when used in a bracket expression (see 2.8.3.2). A vertical-line appearing first or last in an ERE, or immediately following a vertical- line or a left-parentheses, produces undefined results. 1 ^ The circumflex shall be special when used 1 - As an anchor (see 2.8.4.6) or, 1 - As the first character of a bracket expression (see 1 2.8.3.2). 1 $ The dollar-sign shall be special when used as an anchor. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 137 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.8.4.1.3 Periods in EREs A period (.), when used outside of a bracket expression, is an ERE that shall match any character in the supported character set except NUL. 1 2.8.4.2 ERE Bracket Expression The rules for ERE Bracket Expressions are the same as for Basic Regular Expressions; see 2.8.3.2. 2.8.4.3 EREs Matching Multiple Characters The following rules shall be used to construct EREs matching multiple characters from EREs matching a single character: (1) A _c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _E_R_E_s shall match the concatenation of the character sequences matched by each component of the ERE. A 1 concatenation of EREs enclosed in parentheses shall match whatever the concatenation without the parentheses matches. For example, both the ERE cd and the ERE (cd) are matched by the third and fourth character of the string abcdefabcdef. (2) When an ERE matching a single character, or a concatenation of 1 EREs enclosed in parentheses is followed by the special 1 character plus-sign (+), together with that plus-sign it shall 1 match what one or more consecutive occurrences of the ERE would 2 match. For example, the ERE b+(bc) matches the fourth through 2 seventh characters in the string acabbbcde. And, [ab]+ and 2 [ab][ab]* are equivalent. 2 (3) When an ERE matching a single character, or a concatenation of 1 EREs enclosed in parentheses is followed by the special 1 character asterisk (*), together with that asterisk it shall 1 match what zero or more consecutive occurrences of the ERE would 2 match. For example, the ERE b*c matches the first character in the string cabbbcde, and the ERE b*cd matches the third through seventh characters in the string cabbbcdebbbbbbcdbc. And, [ab]* 2 and [ab][ab] are equivalent when matching the string ab. 2 (4) When an ERE matching a single character, or a concatenation of 1 EREs enclosed in parentheses is followed by the special 1 character question-mark (?), together with that question-mark it 1 shall match what zero or one consecutive occurrences of the ERE 2 would match. For example, the ERE b?c matches the second 2 character in the string acabbbcde. (5) When an ERE matching a single character, or a concatenation of 1 EREs enclosed in parentheses is followed by an _i_n_t_e_r_v_a_l 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 138 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 _e_x_p_r_e_s_s_i_o_n of the format {_m}, {_m,}, or {_m,_n}, together with that 1 interval expression it shall match what repeated consecutive 2 occurrences of the ERE would match. The values of _m and _n shall 2 be decimal integers in the range 0 _< _m _< _n _< {RE_DUP_MAX}, where 1 _m specifies the exact or minimum number of occurrences and _n specifies the maximum number of occurrences. The expression {_m} shall match exactly _m occurrences of the preceding ERE, {_m,} shall match at least _m occurrences, and {_m,_n} shall match any number of occurrences between _m and _n, inclusive. 1 For example, in the string abababccccccd the ERE c{3} is matched 1 by characters seven through nine, and the ERE (ab){2,} is 2 matched by characters one through six. 2 The behavior of multiple adjacent duplication symbols (+, *, ?, and 1 intervals) produces undefined results. 1 2.8.4.4 ERE Alternation Two EREs separated by the special character vertical-line (|) shall match a string that is matched by either. For example, the ERE a((bc)|d) matches the string abc and the string ad. Single characters, or expressions matching single characters, separated by the vertical bar and enclosed in parentheses, shall be treated as an ERE matching a single character. 1 2.8.4.5 ERE Precedence The order of precedence shall be as shown in Table 2-13, from high to 1 low. 1 Table 2-13 - ERE Precedence 1 __________________________________________________________________________________________________________________________________________________ 1 _c_o_l_l_a_t_i_o_n-_r_e_l_a_t_e_d _b_r_a_c_k_e_t _s_y_m_b_o_l_s [= =] [: :] [. .] 1 _e_s_c_a_p_e_d _c_h_a_r_a_c_t_e_r_s \<_s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r> 1 _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n [ ] 1 _g_r_o_u_p_i_n_g ( ) 1 _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r-_E_R_E _d_u_p_l_i_c_a_t_i_o_n * + ? {_m,_n} 1 _c_o_n_c_a_t_e_n_a_t_i_o_n 1 _a_n_c_h_o_r_i_n_g ^ $ 1 _a_l_t_e_r_n_a_t_i_o_n | 1 __________________________________________________________________________________________________________________________________________________ For example, the ERE abba|cde matches either the string abba or the 1 string cde (because concatenation has a higher order of precedence than 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 139 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX alternation). 2.8.4.6 ERE Expression Anchoring An ERE can be limited to matching strings that begin or end a line; this 1 is called _a_n_c_h_o_r_i_n_g. The circumflex and dollar-sign special characters 1 shall be considered ERE anchors in the following contexts: 1 (1) A circumflex (^) shall be an anchor when used anywhere outside a 1 bracket expression. The circumflex shall anchor the 1 (sub)expression to the beginning of a string; only sequences 1 starting at the first character of a string shall be matched by 1 the ERE. For example, the EREs ^ab and (^ab) match ab in the 1 string abcdef, but fail to match in the string cdefab. 1 (2) A dollar-sign ($) shall be an anchor when used anywhere outside 1 a bracket expression. It shall anchor the expression to the end 1 of the string being matched; the dollar-sign can be said to match the ``end-of-string'' following the last character. (3) An ERE anchored by both ^ and $ shall match only an entire 2 string. For example, the EREs ^abcdef$ and (^abcdef$) match strings consisting only of abcdef. 2.8.5 Regular Expression Grammar Grammars describing the syntax of both basic and extended regular expressions are presented in this subclause. See the grammar conventions in 2.1.2. 2.8.5.1 BRE/ERE Grammar Lexical Conventions The lexical conventions for regular expressions shall be as described in this subclause. Except as noted, the longest possible token or delimiter beginning at a given point shall be recognized. The following tokens shall be processed (in addition to those string constants shown in the grammar): COLL_ELEM Shall be any single-character collating element, unless it is a META_CHAR. BACKREF (Applicable only to basic regular expressions.) Shall be the character string consisting of '\' followed by a single-digit numeral, 1 through 9. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 140 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 DUP_COUNT Shall represent a numeric constant. It shall be an integer in the range 0 _< DUP_COUNT _< {RE_DUP_MAX}. 1 This token shall only be recognized when the context of the grammar requires it. At all other times, digits not preceded by '\' shall be treated as ORD_CHAR. META_CHAR Shall be one of the characters: ^ When found first in a bracket expression - When found anywhere but first (after an initial ^, if any) or last in a bracket expression, or as the ending range point in a range expression ] When found anywhere but first (after an initial ^, if any) in a bracket expression. L_ANCHOR (Applicable only to basic regular expressions.) Shall be the character ^ when it appears as the first character of a basic regular expression and when not 1 QUOTED_CHAR. The ^ may be recognized as an anchor 1 elsewhere; see 2.8.3.5. 1 ORD_CHAR Shall be a character, other than one of the special 1 characters in SPEC_CHAR. 1 QUOTED_CHAR Shall be one of the character sequences: 1 \^ \. \* \[ \$ \\ 1 R_ANCHOR (Applicable only to basic regular expressions). Shall 1 be the character $ when it appears as the last 1 character of a basic regular expression and when not 1 QUOTED_CHAR. The $ may be recognized as an anchor 1 elsewhere; see 2.8.3.5. 1 SPEC_CHAR For basic regular expressions, shall be one of the following special characters: . Anywhere outside bracket expressions \ Anywhere outside bracket expressions [ Anywhere outside bracket expressions ^ When an anchor; see 2.8.3.5 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 141 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX $ When an anchor; see 2.8.3.5 2 * Anywhere except: first in an entire RE; anywhere in a bracket expression; directly following \(; directly following an anchoring ^. For extended regular expressions, shall be one of the following special characters found anywhere outside bracket expressions: ^ . [ $ ( ) | * + ? { \ (The close-parenthesis shall be considered special in 2 this context only if matched with a preceding open- 2 parenthesis.) 2 2.8.5.2 RE and Bracket Expression Grammar This subclause presents the grammar for basic regular expressions, including the bracket expression grammar that is common to both BREs and EREs. %token ORD_CHAR QUOTED_CHAR SPEC_CHAR DUP_COUNT %token BACKREF L_ANCHOR R_ANCHOR %token Back_open_paren Back_close_paren /* '\(' '\)' */ %token Back_open_brace Back_close_brace /* '\{' '\}' */ /* The following tokens are for the Bracket Expression grammar common to both REs and EREs. */ %token COLL_ELEM META_CHAR 1 %token Open_equal Equal_close Open_dot Dot_close Open_colon Colon_close 1 /* '[=' '=]' '[.' '.]' '[:' ':]' */ 1 %token class_name /* class_name is a keyword to the LC_CTYPE locale category */ /* (representing a character class) in the current locale */ /* and is only recognized between [: and :] */ %start basic_reg_exp Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 142 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 %% /* -------------------------------------------- Basic Regular Expression -------------------------------------------- */ basic_reg_exp : RE_expression | L_ANCHOR | R_ANCHOR | L_ANCHOR R_ANCHOR | L_ANCHOR RE_expression | RE_expression R_ANCHOR | L_ANCHOR RE_expression R_ANCHOR ; RE_expression : simple_RE | RE_expression simple_RE ; simple_RE : nondupl_RE | nondupl_RE RE_dupl_symbol 1 ; nondupl_RE : one_character_RE | Back_open_paren RE_expression Back_close_paren | Back_open_paren Back_close_paren | BACKREF ; /* 1 Note: This grammar does not permit L_ANCHOR or 1 R_ANCHOR inside \( and \) (which implies that ^ and $ 1 are ordinary characters). This reflects the semantic 1 limits on the application, as noted in 2.8.3.5. 1 Implementations are permitted to extend the language to 1 interpret ^ and $ as anchors in these locations, and as 1 such portable applications shall not use unescaped ^ 1 and $ in positions inside \( and \) that might be 1 interpreted as anchors. 1 */ 1 one_character_RE : ORD_CHAR | QUOTED_CHAR | '.' | bracket_expression ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 143 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX RE_dupl_symbol : '*' | Back_open_brace DUP_COUNT Back_close_brace | Back_open_brace DUP_COUNT ',' Back_close_brace | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace ; /* -------------------------------------------- Bracket Expression ------------------------------------------- */ bracket_expression : '[' matching_list ']' | '[' nonmatching_list ']' ; matching_list : bracket_list ; nonmatching_list : '^' bracket_list ; bracket_list : follow_list | follow_list '-' 1 ; follow_list : expression_term | follow_list expression_term ; expression_term : single_expression | range_expression ; single_expression : end_range | character_class 1 ; range_expression : start_range end_range | start_range '-' ; start_range : end_range '-' ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 144 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 end_range : COLL_ELEM | collating_symbol 2 ; collating_symbol : Open_dot COLL_ELEM Dot_close | Open_dot META_CHAR Dot_close ; equivalence_class : Open_equal COLL_ELEM Equal_close ; character_class : Open_colon class_name Colon_close 1 ; 2.8.5.3 ERE Grammar This subclause presents the grammar for extended regular expressions, excluding the bracket expression grammar. NOTE: The bracket expression grammar and the associated %token lines are identical between BREs and EREs. It has been omitted from the ERE subclause to avoid unnecessary editorial duplication. %token ORD_CHAR QUOTED_CHAR SPEC_CHAR DUP_COUNT %start extended_reg_exp %% /* -------------------------------------------- Extended Regular Expression -------------------------------------------- */ extended_reg_exp : anchored_ERE | nonanchored_ERE | extended_reg_exp '|' nonanchored_ERE | extended_reg_exp '|' anchored_ERE ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 145 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX anchored_ERE : '^' nonanchored_ERE | '^' nonanchored_ERE '$' | nonanchored_ERE '$' | '^' | '$' | '^' '$' ; nonanchored_ERE : ERE_expression | nonanchored_ERE ERE_expression ; ERE_expression : one_character_ERE | '(' extended_reg_exp ')' | ERE_expression ERE_dupl_symbol ; one_character_ERE : ORD_CHAR | '\' SPEC_CHAR | '.' | bracket_expression ; ERE_dupl_symbol : '*' | '+' | '?' | '{' DUP_COUNT '}' | '{' DUP_COUNT ',' '}' | '{' DUP_COUNT ',' DUP_COUNT '}' ; BEGIN_RATIONALE 2.8.6 Regular Expression Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_d_i_t_o_r'_s _N_o_t_e: _S_o_m_e _o_f _t_h_e _t_e_x_t _a_n_d _h_e_a_d_i_n_g_s _o_f _t_h_i_s _r_a_t_i_o_n_a_l_e _h_a_v_e _b_e_e_n 1 _r_e_a_r_r_a_n_g_e_d. _M_o_v_e_d _t_e_x_t _h_a_s _n_o_t _b_e_e_n _d_i_f_f_m_a_r_k_e_d _u_n_l_e_s_s _i_t _c_h_a_n_g_e_d. 1 Rather than repeating the description of regular expressions for each utility supporting REs, the working group preferred a common, comprehensive description of regular expressions in one place. The most common behavior is described here, and exceptions or extensions to this are documented for the respective utilities, if appropriate. The Basic Regular Expression corresponds to the ed or historical grep type, and the Extended Regular Expression corresponds to the historical egrep type (now grep -E). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 146 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The text is based on the ed description and substantially modified, primarily to aid developers and others in the understanding of the capabilities and limitations of regular expressions. Much of this was influenced by the internationalization requirements. It should be noted that the definitions in this clause do not cover the tr utility (see 4.64); the tr syntax does not employ regular expressions. The specification of regular expressions are particularly important to internationalization, because pattern matching operations are very basic operations in business and other operations. The syntax and rules of regular expressions are intended to be as intuitive as possible, to make them easy to understand and use. The historical rules and behavior do not provide that capability to non-English-language users, and does not provide the necessary support for commonly used characters and language constructs. It was necessary to provide extensions to the historical regular expression syntax and rules, to accommodate other languages. Such modifications were proposed by the UniForum Technical Committee Subcommittee on Internationalization and accepted by the working group. As they are limited to bracket expressions, the rationale for these modifications can be found in 2.8.6.3.2. 2.8.6.1 Regular Expression Definitions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The definition of which sequence is matched when several are possible is based on the leftmost-longest rule historically used by deterministic 1 recognizers. This rule is much easier to define and describe, and arguably more useful, than the first-match rule historically used by nondeterministic recognizers. It is thought that dependencies on the choice of rule are rare; carefully-contrived examples are needed to demonstrate the difference. A formal expression of the leftmost-longest rule is: 1 The search is performed as if all possible suffixes of the string were tested for a prefix matching the pattern; the longest suffix containing a matching prefix is chosen, and the longest possible matching prefix of the chosen suffix is identified as the matching sequence. It is possible to determine what strings correspond to subexpressions by 1 recursively applying the leftmost longest rule to each subexpression, but 1 only with the proviso that the overall match is leftmost longest (see 1 2.8.1.2). For example, matching \(ac*\)c*d[ac]*\1 against acdacaaa 1 should match acdacaaa (with \1=a); simply matching the longest match for 1 \(ac*\) would yield \1=ac, but the overall match would be smaller 1 (acdac). In principle, the implementation must examine every possible 1 match and among those that yield the leftmost longest total matches, pick 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 147 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX the one that does the longest match for the leftmost subexpression and so 1 on. Note that this means that matching by subexpressions is context 1 dependent: a subexpression within a larger RE may match a different 1 string from the one it would match as an independent RE, and two 1 instances of the same subexpression within the same larger RE may match 1 different lengths even in similar sequences of characters. For example, 1 in the ERE (a.*b)(a.*b), the two identical subexpressions would match 1 four and six characters, respectively, of accbaccccb. Thus, it is not 1 possible to hierarchically decompose the matching problem into smaller, 1 independent, matching problems. 1 Matching is based on the bit pattern used for encoding the character, not on the graphic representation of the character. This means that if a character set contains two or more encodings for a graphic symbol, or if the strings searched contain text encoded in more than one code set, no attempt is made to search for any other representation of the encoded symbol. If that is required, the user can specify equivalence classes containing all variations of the desired graphic symbol. The definition of ``single character'' has been expanded to include also collating elements consisting of two or more characters; this expansion 1 is applicable only when a bracket expression is included in the BRE or 1 ERE. An example of such a collating element may be the Dutch ``ij'', 1 which collates as a ``y.'' In some encodings, a ligature ``i with j'' exists _a_s _a _c_h_a_r_a_c_t_e_r, and would represent a single-character collating element. In another encoding, no such ligature exists, and the two- character sequence ``ij'' is defined as a multicharacter collating element. Outside brackets, the ``ij'' is treated as a two-character RE and will match the same characters in a string. Historically, a bracket expression only matched a single character. If, however, the bracket expression defines, for example, a range that includes ``ij'', then this particular bracket expression will also match a sequence of the two characters ``i'' and ``j'' in the string. 2.8.6.2 Regular Expression General Requirements Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Historically, most regular expression implementations only match lines, not strings. However, that is more an effect of the usage than of an inherent feature of regular expressions itself. Consequently, POSIX.2 does not regard s as special; they are ordinary characters, and both a period and a nonmatching list can match them. Those utilities (like grep) that do not allow s to match are responsible for eliminating any from strings before matching against the RE. The _r_e_g_c_o_m_p() function, however, can provide support for such processing without violating the rules of this clause. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 148 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The definition of case-insensitive processing is intended to allow matching of multicharacter collating elements as well as characters. For instance, as each character in the string is matched using both its cases, the RE [[.Ch.]], when matched against char, is in reality matched against ch, Ch, cH, and CH. 1 Some implementations of egrep have had very limited flexibility in handling complex extended regular expressions. POSIX.2 does not attempt to define the complexity of a BRE or ERE, but does place a lower limit on it--any regular expression must be handled, as long as it can be expressed in 256 bytes or less. (Of course, this does not place an upper limit on the implementation.) There are existing programs using a nondeterministic-recognizer implementation that should have no difficulty with this limit. It is possible that a good approach would be to attempt to use the faster, but more limited, deterministic recognizer for simple expressions and to fall back on the nondeterministic recognizer for those expressions requiring it. Nondeterministic implementations must be careful to observe the 2.8.1.2 rules on which match is chosen; the longest match, not the first match, starting at a given character is used. The term ``invalid'' highlights a difference between this clause and some 1 others: POSIX.2 frequently avoids mandating of errors for syntax 1 violations because they can be used by implementors to trigger 1 extensions. However, the authors of the internationalization features of 1 regular expressions desired to mandate errors for certain conditions to 1 identify usage problems or nonportable constructs. These are identified 1 within this rationale as appropriate. The remaining syntax violations 1 have been left implicitly or explicitly undefined. For example, the BRE 1 construct \{1,2,3\} does not comply with the grammar. A conforming 1 application cannot rely on it producing an error nor matching the literal 1 characters \{1,2,3\}. The term ``undefined'' was used in favor of 1 ``unspecified'' because many of the situations are considered errors on 1 some implementations and it was felt that consistency throughout the 1 clause was preferable to mixing undefined and unspecified. 1 2.8.6.3 Basic Regular Expressions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) 2.8.6.3.1 BREs Matching a Single Character or Collating Element Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 149 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.8.6.3.2 RE Bracket Expression Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) If a bracket expression must specify both - and ], then the ] must be placed first (after the ^, if any) and the - last within the bracket expression. Range expressions are, historically, an integral part of regular expressions. However, the requirements of ``natural language behavior'' and portability does conflict: ranges must be treated according to the current collating sequence, and include such characters that fall within the range based on that collating sequence, regardless of character values. This, however, means that the interpretation will differ depending on collating sequence. If, for instance, one collating sequence defines ``a'..' as a variant of ``a'', while another defines it as a letter following ``z'', then the expression [a-..z] is valid in the first language and invalid in the second. This kind of ambiguity should be avoided in portable applications, and therefore the working group elected to state that ranges must not be used in strictly conforming applications; however, implementations must support them. Some historical implementations allow range expressions where the ending range point of one range is also the starting point of the next (for instance [a-m-o]). This behavior should not be permitted, but to avoid breaking existing implementations, it is now _u_n_d_e_f_i_n_e_d whether it is a valid expression, and how it should be interpreted. Current practice in awk and lex is to accept escape sequences in bracket expressions as per Table 2-15, while the normal regular expression behavior is to regard such a sequence as consisting of two characters. Allowing the awk/lex behavior in regular expressions would change the normal behavior in an unacceptable way; it is expected that awk and lex will decode escape sequences in regular expressions before passing them to _r_e_g_c_o_m_p() or comparable routines. Each utility describes the escape sequences it accepts as an exception to the rules in this clause; the list is not the same, for historical reasons. As noted earlier, the new syntax and rules have been added to accommodate other languages than English. These modifications were proposed by the UniForum Subcommittee on Internationalization and accepted by the working group. The remainder of this clause describes the rationale for these modifications. _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__R_e_q_u_i_r_e_m_e_n_t_s The goal of the internationalization effort was to provide functions and capabilities that matched the capabilities of existing implementations, but that adhered to the user's local customs, rules, and environment. This has also been described as ``removing the ASCII (and English Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 150 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 language) bias.'' In addition, other requirements also influence the standardization efforts, such as _p_o_r_t_a_b_i_l_i_t_y, _e_x_t_e_n_s_i_b_i_l_i_t_y, and _c_o_m_p_a_t_i_b_i_l_i_t_y. In a worldwide environment _p_o_r_t_a_b_i_l_i_t_y carries much weight. Wherever feasible, users should be given the capability to develop code that can execute independently of character set, code set, or language. Standards must also be _e_x_t_e_n_s_i_b_l_e; to support further development, to allow for local or regional extensions, or to accommodate new concepts (such as multibyte characters). _C_o_m_p_a_t_i_b_i_l_i_t_y does not only refer to support of existing code, but also to making the new syntax, semantics, and functions compatible with existing environments and implementations. _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__T_e_c_h_n_i_c_a_l__B_a_c_k_g_r_o_u_n_d The C Standard {7} (and, by implication, also POSIX) recognizes that the ASCII character set used in historical UNIX system implementations is not adequate outside the Anglo-American language area. It is, however, not enough to remove the ASCII bias; the dependency on Anglo-Saxon conventions and rules must also be broadened to accommodate other cultures, including those that require thousands of characters. Character sets are defined by their _a_t_t_r_i_b_u_t_e_s; typical attributes are the _e_n_c_o_d_i_n_g, the _c_o_l_l_a_t_i_n_g _s_e_q_u_e_n_c_e, the _c_h_a_r_a_c_t_e_r _c_l_a_s_s_i_f_i_c_a_t_i_o_n, and the _c_a_s_e _m_a_p_p_i_n_g. It is also recognized that, even within one language area, several combinations of attributes exist: character set attributes are _m_u_t_a_b_l_e and _c_o_m_b_i_n_a_t_o_r_y. So, rather than replacing one straitjacket by another, the proposed standards make character sets _u_s_e_r-_d_e_f_i_n_a_b_l_e and _p_r_o_g_r_a_m- _s_e_l_e_c_t_a_b_l_e. The existence of character set attributes is implicit in regular expressions (REs). This implies that regular expressions must recognize and adapt to the _p_r_o_g_r_a_m-_s_e_l_e_c_t_e_d set of attributes. A program _s_e_l_e_c_t_s the appropriate character set (or combination of attributes) using the mechanism described in 2.5. The _d_e_f_i_n_i_t_i_o_n of a character set (its attributes) is _e_x_t_e_r_n_a_l to an executing program. Many combinations of attributes can exist concurrently. Of particular interest are the following attributes: (1) _C_o_l_l_a_t_i_n_g _S_e_q_u_e_n_c_e. In existing implementations, the _e_n_c_o_d_e_d ASCII ordering matches the _l_o_g_i_c_a_l English collating sequence. This correspondence does not exist for all code sets or Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 151 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX languages. In addition, many languages employ concepts that have no counterparts in English collation: (a) In many languages, ordering is based on the concept of _s_t_r_i_n_g _c_o_l_l_a_t_i_o_n rather than _c_h_a_r_a_c_t_e_r _c_o_l_l_a_t_i_o_n as in English. One of the effects of this is that the ordering is based on _c_o_l_l_a_t_i_n_g _e_l_e_m_e_n_t_s rather than on characters. Characters typically map into collating elements: _O_n_e-_t_o-_o_n_e mapping, where a character is also a collating element, _O_n_e-_t_o-_N mapping, where a single character maps into two or more collating elements (as the German ``B'' (eszet), which collates as ``ss''), _N-_t_o-_o_n_e mapping, where two or more characters map into one collating element (as in the Spanish ``ll'', which collates between ``l'' and ``m''; i.e., a word beginning with ``ll'' collates _a_f_t_e_r a word beginning with ``lo''). (b) A common method for adding characters to an alphabet is to use diacritical marks, such as accents or circumflex ( ^). In some languages, this creates a completely new c`h'aracter, collated differently from the Latin ``base.'' In other languages these accented characters are collated as variants of the Latin base letter; i.e., they have the same relative order; they are _e_q_u_i_v_a_l_e_n_t. If the strings (words) being compared are equal except for ``accents,'' the strings can be ordered based on a secondary ordering _w_i_t_h_i_n the ``equivalence class.'' For instance, in French, the words ``_t_a_c_h_e'', ``_t_^a_c_h_e'', and ``_t_a_c_h_e_t_e_r'' collate in that order. The C Standard {7} recognizes this; it includes new library functions capable of handling complex collation rules. These functions depend on the setting of the _s_e_t_l_o_c_a_l_e() category LC_COLLATE for a definition of the current collation rules. (2) _C_h_a_r_a_c_t_e_r _C_l_a_s_s_i_f_i_c_a_t_i_o_n. Character classification and case mapping is another area where each language (or even language area) has its own rules. Although users in different countries can use the same code set, such as ISO 8859-1 {5}, the definition of what constitutes a letter or an uppercase letter may vary. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 152 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The C Standard {7} recognizes this; library functions used to classify characters or perform case mapping depend on the _s_e_t_l_o_c_a_l_e() category LC_CTYPE for a definition of how characters map to character classes. _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__P_r_o_p_o_s_a_l__A_r_e_a_s Based on the requirements and attribute characteristics defined above, and after reviewing proposals and definitions by X/Open and other organizations, the UniForum Subcommittee on Internationalization decided to concentrate on the following areas: the range expression, character classes, the definition of one-character RE (multicharacter element), and equivalence classes. Most of these are heavily dependent on the current definition of collation sequence; the Subcommittee felt it natural to couple the capabilities and interpretation of bracket expressions closely to the requirements for extended collation capabilities. In addition, the Subcommittee felt that the capabilities described in 2.5 formed a suitable basis for runtime control of regular expression behavior. The Subcommittee realized that the mechanism selected requires changes in the existing syntax. As a rule, the Subcommittee wished to minimize changes and avoid syntactical changes that may cause existing regular expressions to fail. (1) _C_o_l_l_a_t_i_n_g _E_l_e_m_e_n_t_s _a_n_d _S_y_m_b_o_l_s. As noted above, many expressions within a bracket expression are closely connected with collation, and the Subcommittee defined many capabilities in terms of collating elements and collating symbols. A collating element is defined as a sequence of one or more bytes defined in the current collating sequence definition as a unit of collation. In most cases, a collating element is equal to a character, but the collation sequence may exclude some characters, or define two or more characters as a collating element. A one-character RE is, logically enough, defined as one character or something that translates into one character (the number of bits used to represent the character is not an issue here). The expression within square brackets is a one-character RE; i.e., single characters are matched against the list of single characters defined within the brackets. In Spanish, the phrase ``a _t_o _d'' means the sequence of collating elements a, a', b, c, ch, and d. Consequently, with a Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 153 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Spanish character set, the range statement [a-d] includes the ch collating element, even though it is expressed with two characters (N-to-1 mapping). The historical syntax, however, does not allow the user to define either the range from a through ch, or to define ch as a single character rather than as either c or h. The Subcommittee decided that N-to-1 mappings be recognized (if properly delimited), as _o_n_e-_c_h_a_r_a_c_t_e_r _R_E_s inside, but not outside, square brackets (e.g., a period will never match ch). To be distinguishable from a list of the characters themselves, the multicharacter element must be delimited from the remainder of the characters in the string. The characters [. _a_n_d .] are used to delimit a multicharacter collating element from other elements, and can be used to delimit single-character collating elements. (2) _E_q_u_i_v_a_l_e_n_c_e _C_l_a_s_s_e_s. As stated previously, many languages extend the Latin alphabet by using diacritical marks. In some cases, the Latin base character (e.g., a) and the accented versions of the base (e.g., a`, a^ in French) constitute a ``subclass'' of characters with some partially equivalent characteristics but different code values. Because these characters are related, they are often processed as a group. The historical syntax, however, does not provide for this in a portable manner. Although it represents an extension of the historical capabilities, the X/Open group strongly recommended that a properly delimited collating element be recognized as representing an equivalence class, that is as the collating element itself, and all other characters with the same primary order in the collation sequence. The Subcommittee supported this recommendation, and also selected [= and =] as delimiters for equivalence classes. (3) _R_a_n_g_e _E_x_p_r_e_s_s_i_o_n_s. The hyphen historically indicated ``a range of consecutive ASCII characters;'' typically it stands for the word ``to,'' as in ``a to z,'' _a_n_d _i_m_p_l_i_e_s _a_n _o_r_d_e_r_e_d _i_n_t_e_r_v_a_l. _I_n _A_S_C_I_I, _t_h_e _e_n_c_o_d_e_d _o_r_d_e_r _m_a_t_c_h_e_s _t_h_e _l_o_g_i_c_a_l _E_n_g_l_i_s_h _o_r_d_e_r; _t_h_i_s _i_s _n_o_t _t_r_u_e _w_i_t_h _o_t_h_e_r _e_n_c_o_d_i_n_g_s _o_r _w_i_t_h _o_t_h_e_r _a_l_p_h_a_b_e_t_s. _I_f _t_h_e _A_S_C_I_I _d_e_p_e_n_d_e_n_c_y _i_s _r_e_m_o_v_e_d, _a_n _a_l_t_e_r_n_a_t_i_v_e _c_o_u_l_d _h_a_v_e _b_e_e_n _t_o _u_s_e _t_h_e _e_n_c_o_d_e_d _s_e_q_u_e_n_c_e _o_f _w_h_a_t_e_v_e_r _c_o_d_e _s_e_t _i_s _c_u_r_r_e_n_t_l_y _u_s_e_d. _T_h_i_s, _h_o_w_e_v_e_r, _w_o_u_l_d _c_e_r_t_a_i_n_l_y _d_e_c_r_e_a_s_e _p_o_r_t_a_b_i_l_i_t_y, _a_s _w_e_l_l _a_s _r_e_q_u_i_r_i_n_g _t_h_e _u_s_e_r _t_o _k_n_o_w _t_h_e _o_r_d_e_r_i_n_g Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 154 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 _o_f _t_h_e _c_u_r_r_e_n_t _c_o_d_e _s_e_t. _I_t _w_o_u_l_d _a_l_s_o _m_o_s_t _c_e_r_t_a_i_n_l_y _b_e _c_o_u_n_t_e_r-_i_n_t_u_i_t_i_v_e; _a _F_r_e_n_c_h _u_s_e_r _w_o_u_l_d _e_x_p_e_c_t _t_h_e _e_x_p_r_e_s_s_i_o_n [_a-_d] to match any of the letters a, a` a^, b, c, c, or d. The Subcommittee regards this interpretation of ranges as most compatible with existing capabilities, and one that provides for the desired portability. As the _l_o_g_i_c_a_l ordering need not be inherent in the _e_n_c_o_d_e_d sequence, an external definition was required. Such a definition was already present via the _c_o_l_l_a_t_i_n_g _s_e_q_u_e_n_c_e attribute of the character set. The _s_e_t_l_o_c_a_l_e() function provides for an LC_COLLATE category, which defines the current collating sequence. The Subcommittee selected this as the basis for the interpretation of ranges, as well as of equivalence classes and multicharacter collating symbols. (4) _C_h_a_r_a_c_t_e_r _C_l_a_s_s_e_s. The _r_a_n_g_e expression is commonly used to indicate a _c_h_a_r_a_c_t_e_r _c_l_a_s_s; the _e_x(_a_u__c_m_d) section of the _S_V_I_D states: ``... _a _p_a_i_r _o_f _c_h_a_r_a_c_t_e_r_s _s_e_p_a_r_a_t_e_d _b_y - defines a range (e.g., a-z defines any lowercase letter)....'' In reality, [a-z] means ``any lowercase letter between a and z, inclusive.'' This is _o_n_l_y equivalent to ``any lowercase letter'' if the _a is the first and z is the last lowercase letter in the collating sequence. To provide the intended capabilities in a portable way, the Subcommittee introduced a new syntactical element, namely an explicit _c_h_a_r_a_c_t_e_r _c_l_a_s_s. The definition of which characters constitute a specific character class is already present via the LC_CTYPE category of the _s_e_t_l_o_c_a_l_e() function. The Subcommittee selected the identification of character classes by _n_a_m_e, bracketed by [: and :]. A character class cannot be used as an endpoint in a range statement. _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__S_y_n_t_a_x The Subcommittee was careful to propose changes in the regular expression syntax that minimize the impact on existing REs. In evaluating alternatives, the Subcommittee looked at ease of use (terseness, ease to remember, keyboard availability), impact on historical REs (compatibility), implementability, performance and how error-prone the syntax is likely to be (ambiguity). The Subcommittee made the following evaluation: (1) Syntax changes must be limited to expressions within square brackets. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 155 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (2) Strings or characters with special meaning must be delimited from ordinary strings, to avoid compatibility problems. (3) Both initial and terminating delimiter should consist of two characters, to minimize compatibility and ambiguity problems. (4) Outer delimiter character should be bracketing; i.e., naturally indicate initial and terminating side. Examples: {} <> (). (5) The brackets ([]) are, due to the special rules for ``brackets within brackets,'' rather unlikely to be used in the intended way (a closing bracket must precede an open bracket in the existing syntax). (6) To minimize ambiguity, brackets must be paired with another character. Many other symbols are already in use, either within regular expressions, or in the shell. Examples of usable characters are: = . : (7) Because a multicharacter collating element also can be a member of an equivalence class, different delimiters must be chosen for these two expressions. Also, the character class expression must be distinguishable from, e.g., multicharacter collating symbols; although no historical example is known to the Subcommittee, prudence dictated that character classes be given separate delimiters. (8) The Subcommittee selected the period as the secondary delimiter for multicharacter collating symbols. (9) The Subcommittee selected the equals-sign as the secondary delimiter for equivalence classes. (10) The Subcommittee selected the colon as the secondary delimiter for character classes. The specific syntax and facilities described in this clause represent a coalescence of proposals and implementations from several vendors. Due to differences in facilities and syntax, it was not possible to take one implementation and codify it. There are now several implementations closely patterned on the existing proposal. The facilities presented in this clause are described in a manner that does not preclude their use with multibyte character sets. However, no attempt has been made to include facilities specifically intended for such character sets. The definitions of character classes is tied to the LC_CTYPE definition. The set of character classes defined in the C Standard {7} represents the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 156 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 minimum set of character classes required worldwide, i.e., those required by all implementations. It is the working group's belief that local standards bodies, as well as individual vendors, will provide extensions to the standard in these areas, for instance to provide, for example, Kanji character classes. In many historical implementations, an _i_n_v_a_l_i_d _r_a_n_g_e is treated as if it consisted of the endpoints only. For example, [z-a] is treated as [za]. Some implementations treat the above range as [z], and others as [-az]. Neither is correct, and the working group decided that this should be treated as an error. It was proposed that the syntax for bracket expressions be simplified such that the ``extra'' brackets are not needed if the bracket expression only consists of a character class, an equivalence class, or a collating symbol: ``[:alpha:]'' instead of ``[[:alpha:]]''. To ensure unambiguity, if a bracket expression starts with :, =, or ., then it cannot contain a class expression or a collating symbol (or duplicated characters). In addition, it was also proposed that only valid class or collating symbol expressions be accepted: e.g., [[:ctrl:]] is an invalid expression. The working group rejected the proposal. While the syntax [:alpha:] may be intuitive to some, the proposal does not allow, e.g., [:digit:.ch.]. The alternative, to require additional brackets for the latter case would probably cause more errors than the historical syntax. Requiring erroneous class expressions or collating symbols to make the regular expression invalid may minimize the risks for inadvertent spelling errors. However, at this point it was judged that this would reduce consensus. Consideration was given to eliminating the [.ch.] syntax and providing that collating element should be recognized as such both inside and outside bracket expressions. In addition, consideration was given to defining character classes such that collating elements are included. The working group rejected these proposals. The [.ch.] syntax is only required inside bracket expressions due to the fact that a bracket expression historically only matched a single character. If ch is a collating element, a range [a-z] (if ``ch'' falls within it) matches ch. Outside brackets, an expression ch is treated as two concatenated characters, matching the string ``ch''. The [.ch.] expression is intended to allow the specification of a multicharacter collating element separately from ranges in a bracket expression. Character classes are not intended to include collating elements; there is no requirement that all characters in a multicharacter collating element belong to the same character class (for instance ``Ch'' is ``alpha'' but neither ``upper'' nor ''lower''). Introducing collating elements in character classes would be nonintuitive. It was suggested that, because ranges may or may not be meaningful (or even accepted) based on the current collating sequence, they should be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 157 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX eliminated from the syntax (or at least marked obsolescent). It was suggested that, e.g., [z-a] should always be or never be an error, regardless of collating sequence. The working group did not wish to eliminate ranges from the syntax. While it is true that ranges may not be universally portable, they are nevertheless a useful and fundamental construct in regular expressions. The regular expression syntax has consciously been extended to provide both increased portability and extended local capabilities. Where supported, ranges must reflect the current collating sequence. The working group instead elected to include range expressions as an implementation requirement, but state that strictly conforming applications (but not, e.g., National-Body-conforming applications) shall not use range expressions. Treating erroneous ranges as invalid points out that these may not be portable across collating sequences; and is better than (silently) making them behave in a way contrary to the intents of the user. Earlier drafts allowed the use of an equivalence class expression as the 2 starting or ending point of a range expression, such as [[=e=]-f]. This 2 now produces unspecified results because it is possible to define the 2 equivalence class as a disjoint set of characters. This example could 2 produce different results on various systems: 2 - An error. 2 - The equivalent of [[=e=]e-f] (which is the correct portable way to 2 include equivalence class effects in a bracket expression). 2 - All of the collating elements from the lowest value found in the 2 equivalence class, including any of the elements found between the 2 disjoint values. 2 Consideration was given to saying that equivalence classes with disjoint 2 elements produce unspecified results at the start or end of a range, but 2 since the application cannot predict which equivalence classes are 2 disjoint, this is no improvement over the more general statement chosen. 2 It was suggested that, while reference to nonprintable characters is partially supported by the proposed set of character classes, the specificity is not precise enough, and that additional character classes should be supported, e.g., [:tab:] or [:a:]. The working group rejected this proposal, because this feature would represent a substantial enhancement to the current regular expression syntax, and one that cannot be based on internationalization requirements. It is judged that its inclusion would reduce consensus. A future revision of regular expressions should study the capability to create temporary character classes for use in regular expressions; a ``character class macro facility.'' Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 158 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.8.6.3.3 BREs Matching Multiple Characters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The limit of nine backreferences to subexpressions in the RE is based on the use of a single digit identifier; increasing this to multiple digits would break historical applications. This does not imply that only nine 1 subexpressions are allowed in REs. The following is a valid BRE with ten 1 subexpressions: 1 \(\(\(ab\)*c\)*d\)\(ef\)*\(gh\)\{2\}\(ij\)*\(kl\)*\(mn\)*\(op\)*\(qr\)* 1 The working group regards the common current behavior, which supports \_n*, but not \_n\{_m_i_n,_m_a_x\}, or \(...\)*, or \(...\)\{_m_i_n,_m_a_x\}, as a nonintentional result of a specific implementation, and supports both duplication and interval expressions following subexpressions and backreferences. 2.8.6.3.4 Expression Anchoring Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Often, the dollar-sign is viewed as matching the ending in text files. This is not strictly true; the is typically eliminated from the strings to be matched and the dollar-sign matches the terminating null character. The ability of ^, $, and * to be nonspecial in certain circumstances may 1 be confusing to some programmers, but this situation was changed only in 1 a minor way from historical practice to avoid breaking many existing 1 scripts. Some consideration was given to making the use of the anchoring 1 characters undefined if not escaped and not at the beginning or end of 1 strings. This would cause a number of historical BREs, such as 2^10, 1 $HOME, and $1.35, which relied on the characters being treated literally, 1 to become invalid. 1 However, one relatively uncommon case was changed to allow an extension 1 used on some implementations. Historically, the BREs ^foo and \(^foo\) 1 did not match the same string, despite the general rule that 1 subexpressions and entire BREs match the same strings. To achieve 1 balloting consensus, POSIX.2 has allowed an extension on some systems to 1 treat these two cases in the same way by declaring that anchoring _m_a_y 1 occur at the beginning or end of a subexpression. Therefore, portable 1 BREs that require a literal circumflex at the beginning or a dollar-sign 1 at the end of a subexpression must escape them. Note that a BRE such as 1 a\(^bc\) will either match a^bc or nothing on different systems under the 1 POSIX.2 rules. 1 ERE anchoring has been different from BRE anchoring in all historical 1 systems. An unescaped anchor character has never matched its literal 1 counterpart outside of a bracket expression. Some systems treated 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.8 Regular Expression Notation 159 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX foo$bar as a valid expression that never matched anything, others treated 1 it as invalid. POSIX.2 mandates the former, valid unmatched behavior. 1 Some systems have extended the BRE syntax to add alternation. For 1 example, the subexpression \(foo$\|bar\) would match either foo at the 1 end of the string or bar anywhere. The extension is triggered by the use 1 of the undefined \| sequence. Because the BRE is undefined for portable 1 scripts, the extending system is free to make other assumptions, such as 1 that the $ represents the end-of-line anchor in the middle of a 1 subexpression. If it were not for the extension, the $ would match a 1 literal dollar-sign under the POSIX.2 rules. 1 2.8.6.4 Extended Regular Expressions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) As with basic regular expressions, the working group decided to make the interpretation of escaped ordinary characters undefined. The right-parenthesis is not listed as an ERE special character because 1 it is only special in the context of a preceding left-parenthesis. If 1 found without a preceding left-parenthesis, the right-parenthesis has no 1 special meaning. 1 Based on objections in several ballots, the _i_n_t_e_r_v_a_l _e_x_p_r_e_s_s_i_o_n, {_m,_n}, has been added to extended regular expressions. Historically, the interval expression has only been supported in some extended regular expression implementations. The working group estimated that the addition of interval expressions to extended regular expressions would not decrease consensus, and would also make basic regular expressions more of a subset of extended regular expressions than in many historical implementations. It was suggested that, in addition to interval expressions, backreferences (\_n) also should be added to extended regular expressions. This was rejected by the working group as likely to decrease consensus. In historical implementations, multiple duplication symbols are usually interpreted from left to right and treated as additive. As an example, a+*b matches zero or more instances of a followed by a b. In POSIX.2, multiple duplication symbols are undefined; i.e., they cannot be relied upon for portable applications. One reason for this is to provide some scope for future enhancements; the current syntax is very crowded. The precedence of operations differs between EREs and those in lex; in lex, for historical reasons, interval expressions have a lower precedence than concatenation. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 160 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.8.6.5 Regular Expression Grammar Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) None. END_RATIONALE 2.9 Dependencies on Other Standards 2.9.1 Features Inherited from POSIX.1 This subclause describes some of the features provided by POSIX.1 {8} that are assumed to be globally available by all systems conforming to POSIX.2. This subclause does not attempt to detail all of the POSIX.1 {8} features that are required by all of the utilities and functions defined in this standard; the utility and function descriptions point out additional functionality required to provide the corresponding specific features needed by each. The following subclauses describe frequently used concepts. Utility and function description statements override these defaults when appropriate. BEGIN_RATIONALE 2.9.1.0.1 Features Inherited from POSIX.1 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) It has been pointed out that POSIX.2 assumes that a lot of POSIX.1 {8} functionality is present, but never states exactly how much. This is an attempt to clarify the assumptions. This subclause only covers the ``utilities and functions defined by this standard.'' It does not mandate that the specific POSIX.1 {8} interfaces themselves be available to all application programs. A C language program compiled on a POSIX.2 system is not guaranteed that any of the POSIX.1 {8} functions are accessible. (For example, although UNIX system-based implementations of ls will use _s_t_a_t() to get file status, a POSIX.2 implementation of ls on a ``LONG_NAME_OS-based'' implementation might use the _g_e_t__f_i_l_e__a_t_t_r_i_b_u_t_e_s() and the _g_e_t__f_i_l_e__t_i_m_e__s_t_a_m_p_s() system calls.) POSIX.2 only requires equivalent functionality, not equal means of access. In any event, programs requiring the POSIX.1 {8} system interface should specify that they need POSIX.1 {8} conformance and not hope to achieve it by piggybacking on POSIX.2. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.9 Dependencies on Other Standards 161 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.9.1.1 Process Attributes The following process attributes, as described in POSIX.1 {8}, are assumed to be supported for all processes in POSIX.2: controlling terminal real group ID current working directory real user ID effective group ID root directory effective user ID saved set-group-ID file descriptors saved set-user-ID file mode creation mask session membership process ID supplementary group IDs process group ID A conforming implementation may include additional process attributes. BEGIN_RATIONALE 2.9.1.1.1 Process Attributes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The supplementary group IDs requirement is minimal. If {NGROUPS_MAX} is defined to be zero, they are not required. If {NGROUPS_MAX} is greater than zero, the supplementary group IDs are used as described in POSIX.1 {8} in various permission checking operations. The saved-set-group-ID and saved-set-user-ID requirements are also minimal. If {_POSIX_SAVED_IDS} is defined, they are required; otherwise, they are not. A controlling terminal is needed to control access to /dev/tty. The file creation semantics of POSIX.2 require the effective group ID, effective user ID, and the file mode creation mask. Pathname resolution and access permission checks require the current working directory, effective group ID, effective user ID, and root directory. The kill utility requires the effective group ID, effective user ID, process ID, process group ID, real group ID, real user ID, saved set- group-ID, saved set-user-ID, and session membership attributes to perform the various signal addressing and permission checks. The id utility is based on the effective group ID, effective user ID, real group ID, real user ID, and supplementary group IDs. The following process attributes described in POSIX.1 {8} do not seem to be required by POSIX.2: parent process ID, pending signals, process Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 162 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 signal mask, time left until an alarm clock signal, _t_m_s__c_s_t_i_m_e, _t_m_s__c_u_t_i_m_e, _t_m_s__s_t_i_m_e, and _t_m_s__u_t_i_m_e. There are probably other attributes mentioned in POSIX.1 {8} that are not listed here. END_RATIONALE 2.9.1.2 Concurrent Execution of Processes The following functionality of the POSIX.1 {8} _f_o_r_k() function shall be available on all POSIX.2 conformant systems: (1) Independent processes shall be capable of executing independently without either process terminating. (2) A process shall be able to create a new process with all of the attributes referenced in 2.9.1.1, determined according to the semantics of a call to the POSIX.1 {8} _f_o_r_k() function followed by a call in the child process to one of the POSIX.1 {8} _e_x_e_c functions. BEGIN_RATIONALE 2.9.1.2.1 Concurrent Execution of Processes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The historical functionality of _f_o_r_k() is required, which permits the concurrent execution of independent processes. A system with a single thread of process execution is not an appropriate base upon which to build a POSIX.2 system. (This requirement was not explicitly stated in the 1988 POSIX.1, but is included in the current POSIX.1 {8}.) END_RATIONALE 2.9.1.3 File Access Permissions The file access control mechanism described by _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55 applies to all files on a conforming POSIX.2 implementation. BEGIN_RATIONALE 2.9.1.3.1 File Access Permissions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The entire concept of file protections and access control is assumed to be handled as in POSIX.1 {8}. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.9 Dependencies on Other Standards 163 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.9.1.4 File Read, Write, and Creation When a file is to be read or written, the file shall be opened with an access mode corresponding to the operation to be performed. If file access permissions deny access, the requested operation shall fail. When a file that does not exist is created, the following POSIX.1 {8} features shall apply unless the utility or function description states otherwise: (1) The file's user ID is set to the effective user ID of the calling process. (2) The file's group ID is set to the effective group ID of the calling process or the group ID of the directory in which the file is being created. (3) The file's permission bits are set to: S_IROTH | S_IWOTH | S_IRGRP | S_IWGRP | S_IRUSR | S_IWUSR (see POSIX.1 {8} 5.6.1.2) except that the bits specified by the process's file mode creation mask are cleared. (4) The _s_t__a_t_i_m_e, _s_t__c_t_i_m_e, and _s_t__m_t_i_m_e fields of the file shall be updated as specified in _f_i_l_e _t_i_m_e_s _u_p_d_a_t_e in 2.2.2.69. (5) If the file is a directory, it shall be an empty directory; otherwise the file shall have length zero. (6) Unless otherwise specified, the file created shall be a regular file. When an attempt is made to create a file that already exists, the action shall depend on the file type: (1) For directories and FIFO special files, the attempt shall fail and the utility shall either continue with its operation or exit immediately with a nonzero status, depending on the description of the utility. (2) For regular files: (a) The file's user ID, group ID, and permission bits shall not be changed. (b) The file shall be truncated to zero length. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 164 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 (c) The _s_t__c_t_i_m_e and _s_t__m_t_i_m_e fields shall be marked for update. (3) For other file types, the effect is implementation defined. When a file is to be appended, the file shall be opened in a manner equivalent to using the O_APPEND flag, without the O_TRUNC flag, in the POSIX.1 {8} _o_p_e_n() call. BEGIN_RATIONALE 2.9.1.4.1 File Read, Write, and Creation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Even though it might be possible for a process to change the mode of a file to match a requested operation and change the mode back to its original state after the operation is completed, utilities are not allowed to do this unless the utility description states otherwise. As an example, the ed utility r command fails if the file to be read does not exist (even though it could create the file and then read it) or the file permissions do not allow read access [even though it could use the POSIX.1 {8} _c_h_m_o_d() function to make the file readable before attempting to open the file]. END_RATIONALE 2.9.1.5 File Removal When a directory that is the root directory or current working directory of any process is removed, the effect is implementation defined. If file access permissions deny access, the requested operation shall fail. Otherwise, when a file is removed: (1) Its directory entry shall be removed from the file system. (2) The link count of the file shall be decremented. (3) If the file is an empty directory (see 2.2.2.43): (a) If no process has the directory open, the space occupied by the directory shall be freed and the directory shall no longer be accessible. (b) If one or more processes have the directory open, the directory contents shall be preserved until all references to the file have been closed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.9 Dependencies on Other Standards 165 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (4) If the file is a directory that is not empty, the _s_t__c_t_i_m_e field shall be marked for update. (5) If the file is not a directory: (a) If the link count becomes zero: [1] If no process has the file open, the space occupied by the file shall be freed and the file shall no longer be accessible. [2] If one or more processes have the file open, the file contents shall be preserved until all references to the file have been closed. (b) If the link count is not reduced to zero, the _s_t__c_t_i_m_e field shall be marked for update. (6) The _s_t__c_t_i_m_e and _s_t__m_t_i_m_e fields of the containing directory shall be marked for update. BEGIN_RATIONALE 2.9.1.5.1 File Removal Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This is intended to be a summary of the POSIX.1 {8} _u_n_l_i_n_k() and _r_m_d_i_r() requirements needed by POSIX.2. END_RATIONALE 2.9.1.6 File Time Values All files have the three time values described by _f_i_l_e _t_i_m_e_s _u_p_d_a_t_e in 2.2.2.69. BEGIN_RATIONALE 2.9.1.6.1 File Time Values Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) All three time stamps specified by POSIX.1 {8} are needed for utilities like find, ls, make, test, and touch to work as expected. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 166 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.9.1.7 File Contents When a reference is made to the contents of a file, _p_a_t_h_n_a_m_e, this means the equivalent of all of the data placed in the space pointed to by _b_u_f when performing the _r_e_a_d() function calls in the following POSIX.1 {8} operations: while (read (fildes, buf, nbytes) > 0) ; If the file is indicated by a pathname _p_a_t_h_n_a_m_e, the file descriptor shall be determined by the equivalent of the following POSIX.1 operation: fildes = open (pathname, O_RDONLY); The value of _n_b_y_t_e_s in the above sequence is unspecified; if the file is of a type where the data returned by _r_e_a_d() would vary with different values, the value shall be one that results in the most data being returned. If the _r_e_a_d() function calls would return an error, it is unspecified whether the contents of the file are considered to include any data from offsets in the file beyond where the error would be returned. BEGIN_RATIONALE 2.9.1.7.1 File Contents Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This description is intended to convey the traditional behavior for all types of files. This matches the intuitive meaning for regular files, but the meaning is not always intuitive for other types of files. In particular, for FIFOs, pipes, and terminals it must be clear that the contents are not necessarily static at the time a file is opened, but they include the data returned by a sequence of reads until end-of-file is indicated. This is why the _o_p_e_n() call is specified, with the O_NONBLOCK flag not set. Some files, especially character special files, are sensitive to the size of a _r_e_a_d() request. The contents of the file are those resulting from proper choice of this size. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.9 Dependencies on Other Standards 167 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.9.1.8 Pathname Resolution The pathname resolution algorithm described by _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104 shall be used by conforming POSIX.2 implementations. See also _f_i_l_e _h_i_e_r_a_r_c_h_y in 2.2.2.58. BEGIN_RATIONALE 2.9.1.8.1 Pathname Resolution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The whole concept of hierarchical file systems and pathname resolution is assumed to be handled as in POSIX.1 {8}. END_RATIONALE 2.9.1.9 Changing the Current Working Directory 2 When the current working directory (see 2.2.2.159) is to be changed, 2 unless the utility or function description states otherwise, the 2 operation shall succeed unless a call to the POSIX.1 {8} _c_h_d_i_r() function 2 would fail when invoked with the new working directory pathname as its 2 argument. 2 2.9.1.9.1 Changing the Current Working Directory Rationale. (_T_h_i_s 2 _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) 2 This subclause covers the access permissions and pathname structures 2 involved with changing directories, such as with cd or (the UPE-extended) 2 mailx utilities. 2 2.9.1.10 Establish the Locale The functionality of the POSIX.1 {8} _s_e_t_l_o_c_a_l_e() function is assumed to be available on all POSIX.2 conformant systems; i.e., utilities that require the capability of establishing an international operating environment shall be permitted to set the specified category of the international environment. BEGIN_RATIONALE 2.9.1.10.1 Establish the Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The entire concept of locale categories such as the LC_* variables along with any implementation-defined categories is assumed to be handled as in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 168 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 2.9.1.11 Actions Equivalent to POSIX.1 Functions Some utility descriptions specify that a utility performs actions equivalent to a POSIX.1 {8} function. Such specifications require only that the external effects be equivalent, not that any effect within the utility and visible only to the utility be equivalent. BEGIN_RATIONALE 2.9.1.11.1 Actions Equivalent to POSIX.1 Functions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) An objection was received to an earlier draft that said this approach of equivalent functions was unreasonable, as the reader (and the person writing a test suite) would be responsible for interpreting which portions of POSIX.1 {8} were included and which were not. For example, would such intermediate effects as the setting of _e_r_r_n_o be required if the related POSIX.1 {8} function called for that? The answer is no: this standard is only concerned with the end results of functions against the file system and the environment, and not any intermediate values or results visible only to the programmer using the POSIX.1 {8} function in a C (or other high-level language) program. END_RATIONALE 2.9.2 Concepts Derived from the C Standard Some of the standard utilities perform complex data manipulation using their own procedure and arithmetic languages, as defined in their Extended Description or Operands subclauses. Unless otherwise noted, the arithmetic and semantic concepts (precision, type conversion, control flow, etc.) are equivalent to those defined in the C Standard {7}, as described in the following subclauses. Note that there is no requirement that the standard utilities be implemented in any particular programming language. BEGIN_RATIONALE 2.9.2.0.1 Concepts Derived from the C Standard Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was introduced to answer complaints that there was insufficient detail presented by such utilities as awk or sh about their procedural control statements and their methods of performing arithmetic functions. Earlier drafts, derived heavily from the original manual pages, contained statements such as ``for loops similar to the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.9 Dependencies on Other Standards 169 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C Standard {7},'' which was good enough for a general understanding, but insufficient for a real implementation. The C Standard {7} was selected as a model because most historical implementations of the standard utilities were written in C. Thus, it is more likely that they will act in a manner desired by POSIX.2 without modification. Using the C Standard {7} is primarily a notational convenience, so the many ``little languages'' in POSIX.2 would not have to be rigorously described in every aspect. Its selection does not require that the standard utilities be written in Standard C; they could be written in common-usage C, Ada, Pascal, assembler language, or anything else. The sizes of the various numeric values refer to C-language datatypes 1 that are allowed to be different sizes by the C Standard {7}. Thus, like 1 a C-language application, a shell application cannot rely on their exact 1 size. However, it can rely on their minimum sizes expressed in the 1 C Standard {7}, such as {LONG_MAX} for a _l_o_n_g type. 1 END_RATIONALE 1 2.9.2.1 Arithmetic Precision and Operations Integer variables and constants, including the values of operands and option-arguments, used by the standard utilities shall be implemented as equivalent to the C Standard {7} _s_i_g_n_e_d _l_o_n_g data type; floating point shall be implemented as equivalent to the C Standard {7} _d_o_u_b_l_e type. Conversions between types shall be as described in the C Standard {7}. All variables shall be initialized to zero if they are not otherwise assigned by the application's input. Arithmetic operators and functions shall be implemented as equivalent to those in the cited C Standard {7} section, as listed in Table 2-14. The evaluation of arithmetic expressions shall be equivalent to that described in the C Standard {7} section 3.3 Expressions. 2.9.2.2 Mathematic Functions Any mathematic functions with the same names as those in the C Standard {7}'s sections: 4.5 _M_a_t_h_e_m_a_t_i_c_s 4.10.2 _P_s_e_u_d_o-_r_a_n_d_o_m _s_e_q_u_e_n_c_e _g_e_n_e_r_a_t_i_o_n _f_u_n_c_t_i_o_n_s Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 170 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 2-14 - C Standard Operators and Functions _________________________________________________________________________ ___________O_p_e_r_a_t_i_o_n________________C__S_t_a_n_d_a_r_d__{_7_}__E_q_u_i_v_a_l_e_n_t__R_e_f_e_r_e_n_c_e____ ( ) _3._3._1 _P_r_i_m_a_r_y _E_x_p_r_e_s_s_i_o_n_s _________________________________________________________________________ postfix ++ _3._3._2 _P_o_s_t_f_i_x _O_p_e_r_a_t_o_r_s __p_o_s_t_f_i_x__-_-______________________________________________________________ unary + unary - prefix ++ prefix -- _3._3._3 _U_n_a_r_y _O_p_e_r_a_t_o_r_s ~! sizeof() _________________________________________________________________________ * / _3._3._5 _M_u_l_t_i_p_l_i_c_a_t_i_v_e _O_p_e_r_a_t_o_r_s __%_______________________________________________________________________ | + | | | - | _3._3._6 _A_d_d_i_t_i_v_e _O_p_e_r_a_t_o_r_s | _|________________________________|________________________________________| | << | _3._3._7 _B_i_t_w_i_s_e _S_h_i_f_t _O_p_e_r_a_t_o_r_s | _|_>_>______________________________|________________________________________| | <, <= | | | >, >= | _3._3._8 _R_e_l_a_t_i_o_n_a_l _O_p_e_r_a_t_o_r_s | _|________________________________|________________________________________| | == | _3._3._9 _E_q_u_a_l_i_t_y _O_p_e_r_a_t_o_r_s | _|_!_=______________________________|________________________________________| | & | _3._3._1_0 _B_i_t_w_i_s_e _A_N_D _O_p_e_r_a_t_o_r | _|________________________________|________________________________________| _|_^_______________________________|____3.___3.___1__1___B__i__t__w__i__s__e___E__x__c__l__u__s__i__v__e___O__R___O__p__e__r__a__t__o__r__| | | | _3._3._1_2 _B_i_t_w_i_s_e _I_n_c_l_u_s_i_v_e _O_R _O_p_e_r_a_t_o_r | _|________________________________|________________________________________| _|_&_&______________________________|____3.___3.___1__3___L__o__g__i__c__a__l___A__N__D___O__p__e__r__a__t__o__r___________| | || | _3._3._1_4 _L_o_g_i_c_a_l _O_R _O_p_e_r_a_t_o_r | _|________________________________|________________________________________| _|___e__x__p__r?___e__x__p__r:___e__x__p__r_________________|____3.___3.___1__5___C__o__n__d__i__t__i__o__n__a__l___O__p__e__r__a__t__o__r___________| | =, *=, /=, %=, +=, -= | | | <<=, >>=, &=, ^=, |= | _3._3._1_6 _A_s_s_i_g_n_m_e_n_t _O_p_e_r_a_t_o_r_s | _|________________________________|________________________________________| | if ( ) | | | _i_f ( ) ... else | _3._6._4 _S_e_l_e_c_t_i_o_n _S_t_a_t_e_m_e_n_t_s | _|___s__w__i__t__c__h_(__)______________________|________________________________________| | _w_h_i_l_e ( ) | | | _d_o ... _w_h_i_l_e ( ) | _3._6._5 _I_t_e_r_a_t_i_o_n _S_t_a_t_e_m_e_n_t_s | | _f_o_r ( ) | | _|________________________________|________________________________________| | _g_o_t_o | | | | | | Copyright c 1991 IE|EE. All rights reserved. | | This is an unapproved IEEE S|tandards Draft, subject to change. | | | | | | | | | | | | | | | | 2|.9 Dependencies on Other Standar|ds 171| | | | | | | | | | | | | | | | P|1003.2/D11.2 | INFORMATION TECHNOLOGY--POSIX| | | | | _c_o_n_t_i_n_u_e | | | _b_r_e_a_k | _3._6._6 _J_u_m_p _S_t_a_t_e_m_e_n_t_s | | _r_e_t_u_r_n | | _|________________________________|________________________________________| shall be implemented to return the results equivalent to those returned from a call to the corresponding C function described in the C Standard {7}. 2.10 Utility Conventions 2.10.1 Utility Argument Syntax This subclause describes the argument syntax of the standard utilities and introduces terminology used throughout the standard for describing the arguments processed by the utilities. Within the standard, a special notation is used for describing the syntax of a utility's arguments. Unless otherwise noted, all utility descriptions use this notation, which is illustrated by this example (see 3.9.1): utility_name [-a] [-b] [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] [-d | -e] [-f_o_p_t_i_o_n__a_r_g_u_m_e_n_t] [_o_p_e_r_a_n_d ...] The notation used for the Synopsis subclauses imposes requirements on the implementors of the standard utilities and provides a simple reference for the reader of the standard. (1) The utility in the example is named utility_name. It is followed by _o_p_t_i_o_n_s, _o_p_t_i_o_n-_a_r_g_u_m_e_n_t_s, and _o_p_e_r_a_n_d_s. The arguments that consist of hyphens and single letters or digits, such as -a, are known as _o_p_t_i_o_n_s (or, historically, _f_l_a_g_s). Certain options are followed by an _o_p_t_i_o_n-_a_r_g_u_m_e_n_t, as shown with [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t]. The arguments following the last options and option-arguments are named _o_p_e_r_a_n_d_s. (2) Option-arguments are sometimes shown separated from their options by , sometimes directly adjacent. This reflects the situation that in some cases an option-argument is included within the same argument string as the option; in most cases it is the next argument. The Utility Syntax Guidelines in 2.10.2 require that the option be a separate argument from its option- argument, but there are some exceptions in this standard to ensure continued operation of historical applications: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 172 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 (a) If the Synopsis of a standard utility shows a between an option and option-argument (as with [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] in the example), a conforming application shall use separate arguments for that option and its option-argument. (b) If a is not shown (as with [-f_o_p_t_i_o_n__a_r_g_u_m_e_n_t] in the example), a conforming application shall place an option and its option-argument directly adjacent in the same argument string, without intervening s. (c) Notwithstanding the requirements on conforming applications, a conforming implementation shall permit, but shall not require, an application to specify options and option-arguments as separate arguments whether or not a is shown on the synopsis line. (d) A standard utility may also be implemented to operate correctly when the required separation into multiple arguments is violated by a nonconforming application. (3) Options are usually listed in alphabetical order unless this would make the utility description more confusing. There are no implied relationships between the options based upon the order in which they appear, unless otherwise stated in the Options subclause, or unless the exception in 2.10.2 guideline 11 applies. If an option that does not have option-arguments is repeated, the results are undefined, unless otherwise stated. (4) Frequently, names of parameters that require substitution by actual values are shown with embedded underscores. Alternatively, parameters are shown as follows: <_p_a_r_a_m_e_t_e_r _n_a_m_e> The angle brackets are used for the symbolic grouping of a phrase representing a single parameter and shall never be included in data submitted to the utility. (5) When a utility has only a few permissible options, they are sometimes shown individually, as in the example. Utilities with many flags generally show all of the individual flags (that do not take option-arguments) grouped, as in: utility_name [-abcDxyz] [-p _a_r_g] [_o_p_e_r_a_n_d] Utilities with very complex arguments may be shown as follows: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.10 Utility Conventions 173 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX utility_name [_o_p_t_i_o_n_s] [_o_p_e_r_a_n_d_s] (6) Unless otherwise specified, whenever an operand or option- argument is or contains a numeric value: - the number shall be interpreted as a decimal integer. - numerals in the range 0 to 2147483647 shall be syntactically recognized as numeric values. - When the utility description states that it accepts negative numbers as operands or option-arguments, numerals in the range -2147483647 to 2147483647 shall be syntactically recognized as numeric values. This does not mean that all numbers within the allowable range are necessarily semantically correct. A standard utility that accepts an option-argument or operand that is to be interpreted as a number, and for which a range of values smaller than that shown above is permitted by this standard, describes that smaller range along with the description of the option-argument or operand. If an error is generated, the utility's diagnostic message shall indicate that the value is out of the supported range, not that it is syntactically incorrect. (7) Arguments or option-arguments enclosed in the [ and ] notation are optional and can be omitted. The [ and ] symbols shall never be included in data submitted to the utility. (8) Arguments separated by the | vertical bar notation are mutually exclusive. The | symbols shall never be included in data submitted to the utility. Alternatively, mutually exclusive options and operands may be listed with multiple Synopsis lines. For example: utility_name -d [-a] [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] [_o_p_e_r_a_n_d ...] utility_name -e [-b] [_o_p_e_r_a_n_d ...] When multiple synopsis lines are given for a utility, that is an indication that the utility has mutually exclusive arguments. These mutually exclusive arguments alter the functionality of the utility so that only certain other arguments are valid in combination with one of the mutually exclusive arguments. Only one of the mutually exclusive arguments is allowed for invocation of the utility. Unless otherwise stated in an accompanying Options subclause, the relationships between arguments depicted in the Synopsis subclauses are mandatory requirements placed on conforming applications. The use of Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 174 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 conflicting mutually exclusive arguments produces undefined results, unless a utility description specifies otherwise. When an option is shown without the [ ] brackets, it means that option is required for that version of the Synopsis. However, it is not required to be the first argument, as shown in the example above, unless otherwise stated. (9) Ellipses (...) are used to denote that one or more occurrences of an option or operand are allowed. When an option or an operand followed by ellipses is enclosed in brackets, zero or more options or operands can be specified. The forms utility_name -f _o_p_t_i_o_n__a_r_g_u_m_e_n_t ... [_o_p_e_r_a_n_d ...] 1 utility_name [-g _o_p_t_i_o_n__a_r_g_u_m_e_n_t] ... [_o_p_e_r_a_n_d ...] indicate that multiple occurrences of the option and its option-argument preceding the ellipses are valid, with semantics as indicated in the Options subclause of the utility. (See also Guideline 11 in 2.10.2.) In the first example, each option- 1 argument requires a preceding -f and at least one 1 -f _o_p_t_i_o_n__a_r_g_u_m_e_n_t must be given. 1 (10) When the synopsis line is too long to be printed on a single line in this document, the indented lines following the initial line are continuation lines. An actual use of the command would appear on a single logical line. BEGIN_RATIONALE 2.10.1.1 Utility Argument Syntax Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This is the subclause where the definitions of _o_p_t_i_o_n, _o_p_t_i_o_n-_a_r_g_u_m_e_n_t, and _o_p_e_r_a_n_d come together. The working group felt that recent trends toward diluting the Synopsis subclauses of historical manual pages to something like: command [_o_p_t_i_o_n_s] [_o_p_e_r_a_n_d_s] were a disservice to the reader. Therefore, considerable effort was placed into rigorous definitions of all the command line arguments and their interrelationships. The relationships depicted in the Synopses are normative parts of this standard; this information is sometimes repeated in textual form, but that is only for clarity within context. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.10 Utility Conventions 175 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The use of ``undefined'' for conflicting argument usage and for repeated usage of the same option is meant to prevent portable applications from using conflicting arguments or repeated options, unless specifically allowed, as is the case with ls (which allows simultaneous, repeated use of the -C, -l, and -1 options). Many historical implementations will tolerate this usage, choosing either the first or the last applicable argument, and this tolerance can continue, but portable applications cannot rely upon it. (Other implementations may choose to print usage messages instead.) The use of ``undefined'' for conflicting argument usage also allows an implementation to make reasonable extensions to utilities where the implementor considers mutually exclusive options according to POSIX.2 to have a sensible meaning and result. POSIX.2 does not define the result of a utility when an option-argument or operand is not followed by ellipses and the application specifies more than one of that option-argument or operand. This allows an implementation to define valid (although nonstandard) behavior for the utility when more than one such option or operand are specified. Allowing s after an option (i.e., placing an option and its option-argument into separate argument strings) when the standard does not require it encourages portability of users, while still preserving backward compatibility of scripts. Inserting s between the option and the option-argument is preferred; however, historical usage has not been consistent in this area; therefore, s are required to be handled by all implementations, but implementations are also allowed to handle the historical syntax. Another justification for selecting the multiple-argument method was that the single-argument case is inherently ambiguous when the option-argument can legitimately be a null string. Wording was also added to explicitly state that digits are permitted as operands and option-arguments. The lower and upper bounds for the values of the numbers used for operands and option-arguments were derived from the C Standard {7} values for {LONG_MIN} and {LONG_MAX}. The requirement on the standard utilities is that numbers in the specified range do not cause a syntax error although the specification of a number need not be semantically correct for a particular operand or option-argument of a utility. For example, the specification of dd obs=3000000000 would yield undefined behavior for the application and would be a syntax error because the number 3000000000 is outside of the range -2147483647 to +2147483647. On the other hand, dd obs=2000000000 may cause some error, such as ``blocksize too large,'' rather than a syntax error. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 176 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.10.2 Utility Syntax Guidelines The following guidelines are established for the naming of utilities and for the specification of options, option-arguments, and operands. Clause 7.5 describes a function that assists utilities in handling options and operands that conform to these guidelines. Operands and option-arguments can contain characters not specified in 2.4. The guidelines are intended to provide guidance to the authors of future utilities. Some of the standard utilities do not conform to all of these guidelines; in those cases, the Options subclauses describe the deviations. Guideline 1: Utility names should be between two and nine characters, inclusive. Guideline 2: Utility names should include lowercase letters (the lower character classification) from the set described in 2.4 and digits only. Guideline 3: Each option name should be a single alphanumeric character (the alnum character classification) from the set described in 2.4. The -W (capital-W) option shall be reserved for vendor extensions. NOTE: The other alphanumeric characters are subject to standardization in the future, based on historical usage. Implementors should be aware that future POSIX working groups may offer little sympathy to vendors with isolated extensions in conflict with future drafts. Guideline 4: All options should be preceded by the '-' delimiter character. Guideline 5: Options without option-arguments should be accepted when grouped behind one '-' delimiter. Guideline 6: Each option and option-argument should be a separate argument, except as noted in 2.10.1, item (2). Guideline 7: Option-arguments should not be optional. Guideline 8: When multiple option-arguments are specified to follow a single option, they should be presented as a single argument, using commas within that argument or 2 s within that argument to separate them. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.10 Utility Conventions 177 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Guideline 9: All options should precede operands on the command line. Guideline 10: The argument "--" should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the '-' character. The "--" argument should not be used as an option or as an operand. Guideline 11: The order of different options relative to one another should not matter, unless the options are documented as mutually exclusive and such an option is documented to override any incompatible options preceding it. If an option that has option-arguments is repeated, the option and option-argument combinations should be interpreted in the order specified on the command line. Guideline 12: The order of operands may matter and position-related interpretations should be determined on a utility- specific basis. Guideline 13: For utilities that use operands to represent files to be opened for either reading or writing, the "-" operand should be used only to mean standard input (or standard output when it is clear from context that an output file is being specified). Any utility claiming conformance to these guidelines shall conform completely to these guidelines, as if these guidelines contained the term ``shall'' instead of ``should,'' except that the utility is permitted to accept usage in violation of these guidelines for backward compatibility as long as the required form is also accepted. Guidelines 1 and 2 are offered as guidance for locales using Latin alphabets. No recommendations are made by this standard concerning utility naming in other locales. BEGIN_RATIONALE 2.10.2.1 Utility Syntax Guidelines Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause is based on the rules listed in the _S_V_I_D. It was included for two reasons: (1) The individual utility descriptions in Sections 4, 5, and 6, and Annexes A and C needed a set of common (although not universal) actions on which they could anchor their descriptions of option Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 178 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 and operand syntax. Most of the standard utilities actually do use these guidelines, and many of their historical implementations use the _g_e_t_o_p_t() function for their parsing. Therefore, it was simpler to cite the rules and merely identify exceptions. (2) Writers of portable applications need suggested guidelines if the POSIX community is to avoid the chaos of historical UNIX system command syntax. It is recommended that all _f_u_t_u_r_e utilities and applications use these guidelines to enhance ``user portability.'' The fact that some historical utilities could not be changed (to avoid breaking existing applications) should not deter this future goal. The voluntary nature of the guidelines is highlighted by repeated uses of the word _s_h_o_u_l_d throughout. This usage should not be misinterpreted to imply that utilities that claim conformance in their Options subclauses do not always conform. Guideline 2 recommends the naming of utilities. In 3.9.1, it is further stated that a command used in the shell command language cannot be named with a trailing colon. Guideline 3 was changed to allow alphanumeric characters (letters and digits) from the character set to allow compatibility with historical usage. Historical practice allows the use of digits wherever practical; and there are no portability issues that would prohibit the use of digits. In fact, from an internationalization viewpoint, digits (being nonlanguage dependent) are preferable over letters (a ``-2'' is intuitively self-explanatory to any user, while in the ``-f _f_i_l_e_n_a_m_e'' the letter f is a mnemonic aid only to speakers of Latin based languages where ``filename'' happens to translate to a word that begins with f. Since guideline 3 still retains the word ``single,'' multidigit options are not allowed. Instances of historical utilities that used them have been marked obsolescent in this standard, with the numbers being changed from option names to option-arguments. It is difficult to come up with a satisfactory solution to the problem of namespace in option characters. When the POSIX.2 group desired to extend the historical cc utility to accept C Standard {7} programs, it found that all of the portable alphabet was already in use by various vendors. Thus, it had to devise a new name, c89, rather than something like cc -X. There were suggestions that implementors be restricted to providing extensions through various means (such as using a plus-sign as the option delimiter or using option characters outside the alphanumeric set) that would reserve all of the remaining alphanumeric characters for future POSIX standards. These approaches were resisted because they lacked the historical style of UNIX. Furthermore, if a vendor-provided option Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.10 Utility Conventions 179 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX should become commonly used in the industry, it would be a candidate for standardization. It would be desireable to standardize such a feature using existing practice for the syntax (the semantics can be standardized with any syntax). This would not be possible if the syntax was one reserved for the vendor. However, since the standardization process may lead to minor changes in the semantics, it may prove to be better for a vendor to use a syntax that will not be affected by standardization. As a compromise, the following statements are made by the developers of POSIX.2: - In future revisions to this standard, and in other POSIX standards, every attempt will be made to develop new utilities and features that conform to the Utility Syntax Guidelines. - Future extensions and additions to POSIX standards will not use the -W (capital W) option. This option is forever reserved to implementors for extensions, in a manner reminiscent of the option's use in historical versions of the cc utility. The other alphanumeric characters are subject to standardization in the future, based on historical usage. Implementors should be cognizant of these intentions and aware that future POSIX working groups will offer little sympathy to vendors with extensions in conflict with future drafts. In the first version of POSIX.2, vendors held a virtual veto power when conflicts arose with their extensions; in the future, POSIX working groups may be less concerned about preserving isolated extensions that conflict with these statements of intent. Guideline 8 includes the concept of comma-separated lists in a single argument. It is up to the utility to parse such a list itself because _g_e_t_o_p_t() just returns the single string. This situation was retained so that certain historical utilities wouldn't violate the guidelines. Applications preparing for international use should be aware of an occasional problem with comma-separated lists: in some locales, the comma is used as the radix character. Thus, if an application is preparing operands for a utility that expects a comma-separated lists, it should avoid generating noninteger values through one of the means that is influenced by setting the LC_NUMERIC variable [such as awk, bc, printf, or _p_r_i_n_t_f()]. Applications calling any utility with a first operand starting with "-" should usually specify "--", as indicated by Guideline 10, to mark the end of the options. This is true even if the Synopsis in this standard does not specify any options; implementations may provide options as extensions to this standard. The standard utilities that do not support Guideline 10 indicate that fact in the Options subclause of the utility description. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 180 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Guideline 11 was modified to clarify that the order of different options should not matter relative to one another. However, the order of repeated options that also have option-arguments may be significant; therefore, such options are required to be interpreted in the order that they are specified. The make utility is an instance of a historical utility that uses repeated options in which the order is significant. Multiple files are specified by giving multiple instances of the -f option, for example: make -f common_header -f specific_rules target Guideline 13 does not imply that all of the standard utilities automatically accept the operand "-" to mean standard input or output, nor does it specify the actions of the utility upon encountering multiple "-" operands. It simply says that, by default, "-" operands shall not be used for other purposes in the file reading/writing [but not _s_t_a_t()ing, _u_n_l_i_n_k()ing, touch_i_n_g, etc.] utilities. All information concerning actual treatment of the "-" operand is found in the individual utility clauses. An area of concern that was expressed during the balloting process was that as implementations mature implementation-defined utilities and implementation-defined utility options will result. The notion was expressed that there needed to be a standard way, say an environment variable or some such mechanism, to identify implementation-defined utilities separately from standard utilities that may have the same name. It was decided that there already exist several ways of dealing with this situation and that it is outside of the scope of the standard to attempt to standardize in the area of nonstandard items. A method that exists on some historical implementations is the use of the so-called /local/bin or /usr/local/bin directory to separate local or additional copies or versions of utilities. Another method that is also used is to isolate utilities into completely separate domains. Still another method to ensure that the desired utility is being used is to request the utility by its full pathname. There are, to be sure, many approaches to this situation; the examples given above serve to illustrate that there is more than one. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.10 Utility Conventions 181 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.11 Utility Description Defaults This clause describes all of the subclauses used within the utility clauses in Section 4 and the other sections that describe standard utilities. It describes: (1) Intended usage of the subclause. (2) Global defaults that affect all the standard utilities. BEGIN_RATIONALE 2.11.0.1 Utility Description Defaults Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This clause is arranged with headings in the same order as all the utility descriptions. It is a collection of related and unrelated information concerning: (1) The default actions of utilities. (2) The meanings of notations used in the standard that are specific to individual utility subclauses. Although this material may seem out of place in Section 2, it is important that this information appear before any of the utilities to be described later. Unfortunately, since the utilities are split into multiple major sections (chapters), this information could not be placed into any one of those sections without confusing cross references. END_RATIONALE 2.11.1 Synopsis The Synopsis subclause summarizes the syntax of the calling sequence for the utility, including options, option-arguments, and operands. Standards for utility naming are described in 2.10.2; for describing the utility's arguments in 2.10.1. 2.11.2 Description The Description subclause describes the actions of the utility. If the utility has a very complex set of subcommands or its own procedural language, an Extended Description subclause is also provided. Most explanations of optional functionality are omitted here, as they are usually explained in the Options subclause. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 182 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Some utilities in this standard are described in terms of equivalent POSIX.1 {8} functionality. As explained in 1.1, a fully conforming POSIX.1 {8} base is not a prerequisite for this standard. When specific functions are cited, the underlying operating system shall provide equivalent functionality and all side effects associated with successful execution of the function. The treatment of errors and intermediate results from the individual functions cited are generally not specified by this standard. See the utility's Exit Status and Consequences of Errors subclauses for all actions associated with errors encountered by the utility. 2.11.3 Options The Options subclause describes the utility options and option-arguments, and how they modify the actions of the utility. Standard utilities that have options either fully comply with the 2.10.2 or describe all deviations. Apparent disagreements between functionality descriptions in the Options and Description (or Extended Description) subclauses are always resolved in favor of the Options subclause. Each Options subclause that uses the phrase ``The ... utility shall conform to the utility argument syntax guidelines ...'' refers only to the use of the utility as specified by this standard; implementation extensions should also conform to the guidelines, but may allow exceptions for historical practice. Unless otherwise stated in the utility description, when given an option unrecognized by the implementation, or when a required option-argument is not provided, standard utilities shall issue a diagnostic message to standard error and exit with a nonzero exit status. Default Behavior: When this subclause is listed as ``None,'' it means that the implementation need not support any options. Standard utilities that do not accept options, but that do accept operands, shall recognize "--" as a first argument to be discarded. BEGIN_RATIONALE 2.11.3.1 Options Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Although it has not always been possible, the working group has tried to avoid repeating information and therefore reduced the risk that the duplicate explanations are somehow modified to be out of sync. The requirement for recognizing -- is because portable applications need a way to shield their operands from any arbitrary options that the implementation may provide as an extension. For example, if the standard utility foo is listed as taking no options, and the application needed to Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 183 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX give it a pathname with a leading hyphen, it could safely do it as: foo -- -myfile and avoid any problems with -m used as an extension. END_RATIONALE 2.11.4 Operands The Operands subclause describes the utility operands, and how they affect the actions of the utility. Apparent disagreements between functionality descriptions in the Operands and Description (or Extended Description) subclauses are always resolved in favor of the Operands subclause. If an operand naming a file can be specified as -, which means to use the standard input instead of a named file, this shall be explicitly stated in this subclause. Unless otherwise stated, the use of multiple instances of - to mean standard input in a single command produces unspecified results. Unless otherwise stated, the standard utilities that accept operands shall process those operands in the order specified in the command line. Default Behavior: When this subclause is listed as ``None,'' it means that the implementation need not support any operands. BEGIN_RATIONALE 2.11.4.1 Operands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This usage of - is never shown in the Synopsis. Similarly, this usage of -- is never shown. The requirement for processing operands in command line order is to avoid a ``WeirdNIX'' utility that might choose to sort the input files alphabetically, by size, or by directory order. Although this might be acceptable for some utilities, in general the programmer has a right to know exactly what order will be chosen. Some of the standard utilities take multiple _f_i_l_e operands and act as if they were processing the concatenation of those files. For example, asa file1 file2 and cat file1 file2 | asa have similar results when questions of file access, errors, and performance are ignored. Other utilities, such as grep or wc, have Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 184 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 completely different results in these two cases. This latter type of utility is always identified in its Description or Operands subclauses, whereas the former is not. Although it might be possible to create a general assertion about the former case, the following points must be addressed: - Access times for the files might be different in the operand case versus the cat case. - The utility may have error messages that are cognizant of the input file name and this added value should not be suppressed. (As an example, awk sets a variable with the file name at each file boundary.) END_RATIONALE 2.11.5 External Influences The External Influences subclause describes all input data that is specified by the invoker, data received from the environment, and other files or databases that may be used by the utility. There are four subclauses that contain all the substantive information about external influences; because of this, this level of header is always left blank. Certain of the standard utilities describe how they can invoke other utilities or applications, such as by passing a command string to the command interpreter. The external requirements of such invoked utilities are not described in the subclause concerning the standard utility that invokes them. 2.11.5.1 Standard Input The Standard Input subclause describes the standard input of the utility. This subclause is frequently merely a reference to the following subclause, because many utilities treat standard input and input files in the same manner. Unless otherwise stated, all restrictions described in Input Files apply to this subclause as well. Use of a terminal for standard input may cause any of the standard utilities that read standard input to stop when used in the background. For this reason, applications should not use interactive features in scripts to be placed in the background. The specified standard input format of the standard utilities shall not depend on the existence or value of the environment variables defined in this standard, except as provided by this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 185 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Default Behavior: When this subclause is listed as ``None,'' it means that the standard input shall not be read when the utility is used as described by this standard. BEGIN_RATIONALE 2.11.5.1.1 Standard Input Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was globally renamed from Standard Input Format in previous drafts to better reflect its role in describing the existence and usage of the file, in addition to its format. END_RATIONALE 2.11.5.2 Input Files The Input Files subclause describes the files, other than the standard input, used as input by the utility. It includes files named as operands and option-arguments as well as other files that are referred to, such as startup/initialization files, databases, etc. Commonly-used files are generally described in one place and cross-referenced by other utilities. Some of the standard utilities, such as filters, process input files a line or a block at a time and have no restrictions on the maximum input file size. Some utilities may have size limitations that are not as obvious as file space or memory limitations. Such limitations should reflect resource limitations of some sort, not arbitrary limits set by implementors. Implementations shall define in the conformance documentation those utilities that are limited by constraints other than file system space, available memory, and other limits specifically cited by this standard, and identify what the constraint is, and indicate a way of estimating when the constraint would be reached. Similarly, some utilities descend the directory tree (recursively). Implementations shall also document any limits that they may have in descending the directory tree that are beyond limits cited by this standard. When a standard utility reads a seekable input file and terminates 1 without an error before it reaches end-of-file, the utility shall ensure 1 that the file offset in the open file description is properly positioned 1 just past the last byte processed by the utility. For files that are not 1 seekable, the state of the file offset in the open file description for 1 that file is unspecified. 1 When an input file is described as a _t_e_x_t _f_i_l_e, the utility produces undefined results if given input that is not from a text file, unless otherwise stated. Some utilities (e.g., make, read, sh, etc.) allow for continued input lines using an escaped convention; unless otherwise stated, the utility need not be able to accumulate more than Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 186 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 {LINE_MAX} bytes from a set of multiple, continued input lines. If a utility using the escaped convention detects an end-of-file condition immediately after an escaped , the results are unspecified. Record formats are described in a notation similar to that used by the C language function, _p_r_i_n_t_f(). See 2.12 for a description of this notation. Default Behavior: When this subclause is listed as ``None,'' it means that no input files are required to be supplied when the utility is used as described by this standard. BEGIN_RATIONALE 2.11.5.2.1 Input Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was globally renamed from Input File Formats in previous drafts to better reflect its role in describing the existence and usage of the files, in addition to their format. The description of file offsets answers the question: Are the following 1 three commands equivalent? 1 tail -n +2 file 1 (sed -n 1q; cat) < file 1 cat file | (sed -n 1q; cat) 1 The answer is that a conforming application cannot assume they are 1 equivalent. The second command is equivalent to the first only when the 1 file is seekable. In the third command, if the file offset in the open 1 file description were not unspecified, sed would have to be implemented 1 so that it read from the pipe one byte at a time or it would have to 1 employ some method to seek backwards on the pipe. Such functionality is 1 not defined currently in POSIX.1 {8} and does not exist on all historical 1 systems. Other utilities, such as head, read, and sh, have similar 1 properties, so the restriction is described globally in this clause. A 1 future revision to this standard may require that the standard utilities 1 leave the file offset in a consistent state for pipes as well as regular 1 files. 1 The description of conformance documentation about file sizes follows many changes of direction by the working group. Originally, there appeared a limit, {ED_FILE_MAX}, that hoped to impose a minimum file size on ed, which has been historically limited to relatively small files. This received objections from various members who said that such a limit merely invited sloppy programming; there should be no limits to a ``well-written'' ed. Thus, Draft 8 removed the limit and inserted Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 187 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX rationale that this meant ed would have to process files of virtually unlimited size. (Surprisingly, no objections or comments were received about that sentence.) However, in discussing the matter with representatives of POSIX.3, it turned out that omitting the limit meant that a corresponding test assertion would also be omitted and no test suite could legitimately stress ed with large files. It quickly became clear that restrictions applied to other utilities as well and a solution was needed. It is not possible for this standard to judge which utilities are in the category with arbitrary file size limits; this would impose too much on implementors. Therefore, the burden is placed on implementors to publicly document any limitations and the resulting pressure in the marketplace should keep most implementations adequate for most portable applications. Typically, larger systems would have larger limits than smaller systems, but since price typically follows function, the user can select a machine that handles his/her problems reasonably given such information. The working group considered adding a limit in 2.13.1 for every file-oriented utility, but felt these limits would not actually be used by real applications and would reduce consensus. This is particularly true for utilities, such as possibly awk or yacc, that might have rather complex limits not directly related to the actual file size. The definition of _t_e_x_t _f_i_l_e (see 2.2.2.151) is strictly enforced for input to the standard utilities; very few of them list exceptions to the undefined results called for here. (Of course, ``undefined'' here does not mean that existing implementations necessarily have to change to start indicating error conditions. Conforming applications cannot rely on implementations succeeding or failing when nontext files are used.) The utilities that allow line continuation are generally those that accept input languages, rather than pure data. It would be unusual for an input line of this type to exceed {LINE_MAX} bytes and unreasonable to require that the implementation allow unlimited accumulation of multiple lines, each of which could reach {LINE_MAX}. Thus, for a portable application the total of all the continued lines in a set cannot exceed {LINE_MAX}. The format description is intended to be sufficiently rigorous to allow other applications to generate these input files. However, since s can legitimately be included in some of the fields described by the standard utilities, particularly in locales other than the POSIX Locale, this intent is not always realized. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 188 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.11.5.3 Environment Variables The Environment Variables subclause lists what variables affect the utility's execution. The entire manner in which environment variables described in this standard affect the behavior of each utility is described in the Environment Variables subclause for that utility, in conjunction with the global effects of the LANG and LC_ALL environment variables described in 2.6. The existence or value of environment variables described in this standard shall not otherwise affect the specified behavior of the standard utilities. Any effects of the existence or value of environment variables not described by this standard upon the standard utilities are unspecified. For those standard utilities that use environment variables as a means for selecting a utility to execute (such as CC in make), the string provided to the utility shall be subjected to the path search described for PATH in 2.6. Default Behavior: When this subclause is listed as ``None,'' it means that the behavior of the utility is not directly affected by environment variables described by this standard when the utility is used as described by this standard. BEGIN_RATIONALE 2.11.5.3.1 Environment Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The global default text about the PATH search is overkill in this version of POSIX.2 (prior to the UPE) because only one of the standard utilities specifies variables in this way--make's $(CC), $(LEX), etc. It is described here mostly in anticipation of its heavier usage in POSIX.2a. The description of PATH indicates separately that names including slashes do not apply, so they do not apply here either. END_RATIONALE 2.11.5.4 Asynchronous Events The Asynchronous Events subclause lists how the utility reacts to such events as signals and what signals are caught. Default Behavior: When this subclause is listed as ``Default,'' or it refers to ``the standard action for all other signals; see 2.11.5.4,'' it means that the action taken as a result of the signal shall be one of the following: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 189 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (1) The action is that inherited from the parent according to the rules of inheritance of signal actions defined in POSIX.1 {8} (see 2.9.1), or (2) When no action has been taken to change the default, the default action is that specified by POSIX.1 {8}, or (3) The result of the utility's execution is as if default actions had been taken. BEGIN_RATIONALE 2.11.5.4.1 Asynchronous Events Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Because there is no language prohibiting it, a utility is permitted to catch a signal, perform some additional processing (such as deleting temporary files), restore the default signal action (or action inherited from the parent process) and resignal itself. END_RATIONALE 2.11.6 External Effects The External Effects subclause describes the effects of the utility on the operational environment, including the file system. There are three subclauses that contain all the substantive information about external effects; because of this, this level of header is usually left blank. Certain of the standard utilities describe how they can invoke other utilities or applications, such as by passing a command string to the command interpreter. The external effects of such invoked utilities are not described in the subclause concerning the standard utility that invokes them. 2.11.6.1 Standard Output The Standard Output subclause describes the standard output of the utility. This subclause is frequently merely a reference to the following subclause, Output Files, because many utilities treat standard output and output files in the same manner. Use of a terminal for standard output may cause any of the standard utilities that write standard output to stop when used in the background. For this reason, applications should not use interactive features in scripts to be placed in the background. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 190 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Record formats are described in a notation similar to that used by the C language function, _p_r_i_n_t_f(). See 2.12 for a description of this notation. The specified standard output of the standard utilities shall not depend on the existence or value of the environment variables defined in this standard, except as provided by this standard. Default Behavior: When this subclause is listed as ``None,'' it means that the standard output shall not be written when the utility is used as described by this standard. BEGIN_RATIONALE 2.11.6.1.1 Standard Output Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was globally renamed from Standard Output Format in previous drafts to better reflect its role in describing the existence and usage of the file, in addition to its format. The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser. END_RATIONALE 2.11.6.2 Standard Error The Standard Error subclause describes the standard error output of the utility. Only those messages that are purposely sent by the utility are described. Use of a terminal for standard error may cause any of the standard utilities that write standard error output to stop when used in the background. For this reason, applications should not use interactive features in scripts to be placed in the background. The format of diagnostic messages for most utilities is unspecified, but the language and cultural conventions of diagnostic and informative messages whose format is unspecified by this standard should be affected by the setting of LC_MESSAGES. The specified standard error output of standard utilities shall not depend on the existence or value of the environment variables defined in this standard, except as provided by this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 191 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Default Behavior: When this subclause is listed as ``Used only for diagnostic messages,'' it means that, unless otherwise stated, the diagnostic messages shall be sent to the standard error only when the exit status is nonzero and the utility is used as described by this standard. When this subclause is listed as ``None,'' it means that the standard error shall not be used when the utility is used as described in this standard. BEGIN_RATIONALE 2.11.6.2.1 Standard Error Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was globally renamed from Standard Error Format in previous drafts to better reflect its role in describing the existence and usage of the file, in addition to its format. This subclause does not describe error messages that refer to incorrect operation of the utility. Consider a utility that processes program source code as its input. This subclause is used to describe messages produced by a correctly operating utility that encounters an error in the program source code on which it is processing. However, a message indicating that the utility had insufficient memory in which to operate would not be described. Some compilers have traditionally produced warning messages without returning a nonzero exit status; these are specifically noted in their subclauses. Other utilities are expected to remain absolutely quiet on the standard error if they want to return zero, unless the implementation provides some sort of extension to increase the verbosity or debugging level. The format descriptions are intended to be sufficiently rigorous to allow post-processing of output by other programs. END_RATIONALE 2.11.6.3 Output Files The Output Files subclause describes the files created or modified by the utility. Temporary or system files that are created for internal usage by this utility or other parts of the implementation (spool, log, audit files, etc.) are not described in this, or any, subclause. The utilities creating such files and the names of such files are unspecified. If applications are written to use temporary or intermediate files, they should use the TMPDIR environment variable, if it is set and represents an accessible directory, to select the location 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 192 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 of temporary files. 1 Implementations shall ensure that temporary files, when used by the standard utilities, are named so that different utilities or multiple instances of the same utility can operate simultaneously without regard to their working directories, or any other process characteristic other than process ID. There are two exceptions to this requirement: (1) Resources for temporary files other than the namespace (for example, disk space, available directory entries, or number of processes allowed) are not guaranteed. (2) Certain standard utilities generate output files that are intended as input for other utilities, (for example, lex generates lex.yy.c) and these cannot have unique names. These cases are explicitly identified in the descriptions of the respective utilities. Any temporary files created by the implementation shall be removed by the implementation upon a utility's successful exit, exit because of errors, or before termination by any of the SIGHUP, SIGINT, or SIGTERM signals, unless specified otherwise by the utility description. Record formats are described in a notation similar to that used by the C language function, _p_r_i_n_t_f(). See 2.12 for a description of this notation. Default Behavior: When this subclause is listed as ``None,'' it means that no files are created or modified as a consequence of direct action on the part of the utility when the utility is used as described by this standard. However, the utility may create or modify system files, such as log files, that are outside of the utility's normal execution environment. BEGIN_RATIONALE 2.11.6.3.1 Output Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was globally renamed from Output File Formats in previous drafts to better reflect its role in describing the existence and usage of the files, in addition to their format. The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser. Receipt of the SIGQUIT signal should generally cause termination (unless in some debugging mode) that would bypass any attempted recovery actions. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 193 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 2.11.7 Extended Description The Extended Description subclause provides a place for describing the actions of very complicated utilities, such as text editors or language processors, which typically have elaborate command languages. Default Behavior: When this subclause is listed as ``None,'' no further description is necessary. 2.11.8 Exit Status The Exit Status subclause describes the values the utility shall return to the calling program, or shell, and the conditions that cause these values to be returned. Usually, utilities return zero for successful completion and values greater than zero for various error conditions. If specific numeric values are listed in this subclause, conforming implementations shall use those values for the errors described. In some cases, status values are listed more loosely, such as ``>0.'' A Strictly Conforming POSIX.2 Application shall not rely on any specific value in the range shown and shall be prepared to receive any value in the range. Unspecified error conditions may be represented by specific values not listed in the standard. BEGIN_RATIONALE 2.11.8.1 Exit Status Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Note the additional discussion of exit status values in 3.8.2. It 1 describes requirements for returning exit values > 125. 1 A utility may list zero as a successful return, 1 as a failure for a specific reason, and >1 as ``an error occurred.'' In this case, unspecified conditions may cause a 2 or 3, or other value, to be returned. A Strictly Conforming POSIX.2 Application should be written so that it tests for successful exit status values (zero in this case), rather than relying upon the single specific error value listed in the standard. In that way, it will have maximum portability, even on implementations with extensions. The working group is aware that the general nonenumeration of errors makes it difficult to write test suites that test the _i_n_c_o_r_r_e_c_t operation of utilities. There are some historical implementations that have expended effort to provide detailed status messages and a helpful Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 194 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 environment to bypass or explain errors, such as prompting, retrying, or ignoring unimportant syntax errors; other implementations have not. Since there is no realistic way to mandate system behavior in cases of undefined application actions or system problems--in a manner acceptable to all cultures and environments--attention has been limited to the correct operation of utilities by the conforming application. Furthermore, the portable application does not need detailed information concerning errors that it caused through incorrect usage or that it cannot correct anyway. The high degree of competition in the emerging POSIX marketplace should ensure that users requiring friendly, resilient environments will be able to purchase such without detailed specification in this standard. There is no description of defaults for this subclause because all of the standard utilities specify something (or explicitly state ``Unspecified'') for Exit Status. END_RATIONALE 2.11.9 Consequences of Errors The Consequences of Errors subclause describes the effects on the environment, file systems, process state, etc., when error conditions occur. It does not describe error messages produced or exit status values used. The many reasons for failure of a utility are generally not specified by the utility descriptions. Utilities may terminate prematurely if they encounter: invalid usage of options, arguments, or environment variables; invalid usage of the complex syntaxes expressed in Extended Description subclauses; difficulties accessing, creating, reading, or writing files; or, difficulties associated with the privileges of the process. The following shall apply to each utility, unless otherwise stated: - If the requested action cannot be performed on an operand representing a file, directory, user, process, etc., the utility shall issue a diagnostic message to standard error and continue processing the next operand in sequence, but the final exit status shall be returned as nonzero. - If the requested action characterized by an option or option- argument cannot be performed, the utility shall issue a diagnostic message to standard error and the exit status returned shall be nonzero. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 195 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - When an unrecoverable error condition is encountered, the utility shall exit with a nonzero exit status. - A diagnostic message shall be written to standard error whenever an error condition occurs. Default Behavior: When this subclause is listed as ``Default,'' it means that any changes to the environment are unspecified. BEGIN_RATIONALE 2.11.9.1 Consequences of Errors Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) When a utility encounters an error condition several actions are possible, depending on the severity of the error and the state of the utility. Included in the possible actions of various utilities are: deletion of temporary or intermediate work files; deletion of incomplete files; validity checking of the file system or directory. In Draft 9, most of the Consequences of Errors subclauses were changed to ``Default.'' This is due to the more elaborate description of the default case now carried in this subclause and the fact that most of the standard utilities actually use that default. END_RATIONALE BEGIN_RATIONALE 2.11.10 Rationale This subclause provides historical perspective and justification of working group actions concerning the utility. _E_x_a_m_p_l_e_s_,__U_s_a_g_e This subclause provides examples and usage of the utility. In some cases certain characters are interpreted as special characters to the shell. In the rest of the standard, these characters are shown without escape characters or quoting (see 3.2). In all examples, however, quoting has been used, showing how sample commands (utility names combined with arguments) could be passed correctly to a shell (see sh in 4.56) or as a string to the _s_y_s_t_e_m() function. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 196 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This subclause provides historical perspective for decisions that were made. _U_n_r_e_s_o_l_v_e_d__O_b_j_e_c_t_i_o_n_s These subclauses were removed from Draft 10. The Unresolved Objections are maintained in a separate list and do not meet ISO editing requirements for an informative annex. 2.11.10.1 Rationale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The Rationale subclauses will be moved to Annex E in the final POSIX.2. Some of the subheadings may be collapsed in that document; in these drafts the working group has not always been very rigorous about what is a description of usage versus a history of decisions made, for example. The final rationale will de-emphasize the chronological aspects of working group decisions. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.11 Utility Description Defaults 197 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.12 File Format Notation The Standard Input, Standard Output, Standard Error, Input Files, and Output Files subclauses of the utility descriptions, when provided, use a syntax to describe the data organization within the files, when that organization is not otherwise obvious. The syntax is similar to that used by the C language _p_r_i_n_t_f() function, as described in this clause. When used in Standard Input or Input Files subclauses of the utility descriptions, this syntax describes the format that could have been used to write the text to be read, not a format that could be used by the C language _s_c_a_n_f() function to read the input file. The description of an individual record is as follows: "<_f_o_r_m_a_t>", [ <_a_r_g_1>, <_a_r_g_2>, ..., <_a_r_g_n> ] The _f_o_r_m_a_t is a character string that contains three types of objects defined below: _c_h_a_r_a_c_t_e_r_s Characters that are not _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s or _c_o_n_v_e_r_s_i_o_n _s_p_e_c_i_f_i_c_a_t_i_o_n_s, as described below, shall be copied to the output. _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s Represent nongraphic characters. _c_o_n_v_e_r_s_i_o_n _s_p_e_c_i_f_i_c_a_t_i_o_n_s Specifies the output format of each argument. (See below.) The following characters have the following special meaning in the format string: " " (An empty character position.) One or more characters. W Exactly one character. The escape-sequences in Table 2-15 depict the associated action on display devices capable of the action. Each conversion specification shall be introduced by the percent-sign character (%). After the character %, the following shall appear in sequence: _f_l_a_g_s Zero or more _f_l_a_g_s, in any order, that modify the meaning of the conversion specification. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 198 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 2-15 - Escape Sequences __________________________________________________________________________________________________________________________________________________ Escape Represents Sequence Character Terminal Action _________________________________________________________________________ \\ backslash None. \a Attempts to alert the user through audible or visible notification. \b Moves the printing position to one column before the current position, unless the current position is the start of a line. \f Moves the printing position to the initial printing position of the next logical page. \n Moves the printing position to the start of the next line. \r Moves the printing position to the start of the current line. \t Moves the printing position to the next tab position on the current line. If there are no more tab positions left on the line, the behavior is undefined. \v Moves the printing position to the start of the next vertical tab position. If there are no more vertical tab positions left on the page, the behavior is undefined. __________________________________________________________________________________________________________________________________________________ _f_i_e_l_d _w_i_d_t_h An optional string of decimal digits to specify a minimum _f_i_e_l_d _w_i_d_t_h. For an output field, if the converted value has fewer bytes than the field width, it shall be padded on the left [or right, if the left-adjustment flag (-), described below, has been given] to the field width. _p_r_e_c_i_s_i_o_n Gives the minimum number of digits to appear for the d, o, i, u, x, or X conversions (the field shall be padded with leading zeros), the number of digits to appear after the radix character for the e and f conversions, the maximum number of significant digits for the g conversion; or the maximum number of bytes to be written from a string in s conversion. The precision shall take the form of a period (.) followed by a decimal digit string; a null digit string shall be treated as zero. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.12 File Format Notation 199 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _c_o_n_v_e_r_s_i_o_n _c_h_a_r_a_c_t_e_r_s A conversion character (see below) that indicates the type of conversion to be applied. The _f_l_a_g characters and their meanings are: - The result of the conversion shall be left-justified within the field. + The result of a signed conversion always shall begin with a sign (+ or -). If the first character of a signed conversion is not a sign, a shall be prefixed to the result. This means that if the and + flags both appear, the flag shall be ignored. # The value is to be converted to an ``alternate form.'' For c, d, i, u, and s conversions, the behavior is undefined. For o conversion, it shall increase the precision to force the first digit of the result to be a zero. For x or X conversion, a nonzero result shall have 0x or 0X prefixed to it, respectively. For e, E, f, g and G conversions, the result shall always contain a radix character, even if no digits follow the radix character. For g and G conversions, trailing zeroes shall not be removed from the result as they usually are. 0 For d, i, o, u, x, X, e, E, f, g, and G conversions, leading zeroes (following any indication of sign or base) shall be used to pad to the field width; no space padding shall be performed. If the 0 and - flags both appear, the 0 flag shall be ignored. For d, i, o, u, x, and X conversions, if a precision is specified, the 0 flag shall be ignored. For other conversions, the behavior is undefined. Each conversion character shall result in fetching zero or more arguments. The results are undefined if there are insufficient arguments for the format. If the format is exhausted while arguments remain, the excess arguments shall be ignored. The _c_o_n_v_e_r_s_i_o_n _c_h_a_r_a_c_t_e_r_s and their meanings are: d,i,o,u,x,X The integer argument shall be written as signed decimal (d or i), unsigned octal (o), unsigned decimal (u), or unsigned hexadecimal notation (x and X). The d and i specifiers shall convert to signed decimal in the style [-]_d_d_d_d. The x conversion shall use the numbers and Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 200 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 letters 0123456789abcdef and the X conversion shall use the numbers and letters 0123456789ABCDEF. The _p_r_e_c_i_s_i_o_n component of the argument shall specify the minimum number of digits to appear. If the value being converted can be represented in fewer digits than the specified minimum, it shall be expanded with leading zeroes. The default precision shall be 1. The result of converting a zero value with a precision of 0 shall be no characters. If both the field width and precision are omitted, the implementation may precede and/or follow numeric arguments of types d, i, and u with s; arguments of type o (octal) may be preceded with leading zeroes. f The floating point number argument shall be written in decimal notation in the style "[-]_d_d_d._d_d_d", where the number of digits after the radix character (shown here as a decimal point) shall be equal to the _p_r_e_c_i_s_i_o_n specification. The LC_NUMERIC locale category shall determine the radix character to use in this format. If the _p_r_e_c_i_s_i_o_n is omitted from the argument, six digits shall be written after the radix character; if the _p_r_e_c_i_s_i_o_n is explicitly 0, no radix character shall appear. e,E The floating point number argument shall be written in the style "[-]_d._d_d_d_e+__d_d" (the symbol +_ indicates either a plus or minus sign), where there is one digit before the radix character (shown here as a decimal point) and the number of digits after it is equal to the precision. The LC_NUMERIC locale category shall determine the radix character to use in this format. When the precision is missing, six digits shall be written after the radix character; if the precision is 0, no radix character shall appear. The E conversion character shall produce a number with E instead of e introducing the exponent. The exponent always shall contain at least two digits. However, if the value to be written requires an exponent greater than two digits, additional exponent digits shall be written as necessary. g,G The floating point number argument shall be written in style f or e (or in style E in the case of a G conversion character), with the precision specifying the number of significant digits. The style used depends on the value converted: style e shall be used only if the exponent resulting from the conversion is less than -4 or greater than or equal to the precision. Trailing zeroes shall be removed from the result. A radix character shall appear only if it is followed by a digit. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.12 File Format Notation 201 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX c The integer argument shall be converted to an _u_n_s_i_g_n_e_d _c_h_a_r and the resulting byte shall be written. s The argument shall be taken to be a string and bytes from the string shall be written until the end of the string or the number of bytes indicated by the _p_r_e_c_i_s_i_o_n specification of the argument is reached. If the precision is omitted from the argument, it shall be taken to be infinite, so all bytes up to the end of the string shall be written. % Write a % character; no argument shall be converted. In no case does a nonexistent or insufficient _f_i_e_l_d _w_i_d_t_h cause truncation of a field; if the result of a conversion is wider than the field width, the field shall be simply expanded to contain the conversion result. BEGIN_RATIONALE 2.12.1 File Format Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This clause was originally derived from the description of _p_r_i_n_t_f() in the _S_V_I_D, but it has been updated following the publication of the C Standard {7}. It is not identical to the C Standard's {7} _p_r_i_n_t_f(), as it deals with integers as being essentially one type, disregarding possible internal differences between _i_n_t, _s_h_o_r_t, and _l_o_n_g. It has also had some of the internal C language dependencies removed (such as the requirement for null-terminated strings). This standard provides a rigorous description of the format of utility input and output files. It is the intention of this standard that these descriptions be adequate sources of information so that portable applications can use other utilities such as lex or awk to reliably parse the output of these utilities as their input in, say a pipeline. The notation for spaces allows some flexibility for application output. Note that an empty character position in _f_o_r_m_a_t represents one or more characters on the output (not _w_h_i_t_e _s_p_a_c_e, which can include s). Therefore, another utility that reads that output as its input must be prepared to parse the data using _s_c_a_n_f(), awk, etc. The W character is used when exactly one is output. The treatment of integers and spaces is different from the real _p_r_i_n_t_f(), in that they can be surrounded with _s. This was done so that, given a format such as: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 202 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 "%d\n", <_f_o_o> the implementation could use a real _p_r_i_n_t_f() such as printf("%6d\n", foo); and still conform. It would have been possible for the standard to use "%6d\n", but it would have been difficult to pick a number that would have pleased everyone. This notation is thus somewhat like _s_c_a_n_f() in addition to _p_r_i_n_t_f(). The _p_r_i_n_t_f() function was chosen as a model as most of the working group was familiar with it and it was thought that many of the readers would be as well. One difference from the C function _p_r_i_n_t_f() is that the l and h conversion characters are not used. As expressed by this standard, there is no differentiation between decimal values for _i_n_ts versus _l_o_n_gs versus _s_h_o_r_ts. The specifications %d or %i should be interpreted as an arbitrary length sequence of digits. Also, no distinction is made between single precision and double precision numbers (_f_l_o_a_t/_d_o_u_b_l_e in C). These are simply referred to as floating point numbers. Many of the output descriptions in this standard use the term _l_i_n_e, such as: "%s", <_i_n_p_u_t _l_i_n_e> Since the definition of _l_i_n_e includes the trailing character already, there is no need to include a "\n" in the format; a double would otherwise result. In the language at the end of the clause: ``In no case does a nonexistent or insufficient _f_i_e_l_d _w_i_d_t_h cause truncation of a field; ...'' the term ``field width'' should not be confused with the term ``precision'' used in the description of %s. Examples: To represent the output of a program that prints a date and time in the form Sunday, July 3, 10:02, where <_w_e_e_k_d_a_y> and <_m_o_n_t_h> are strings: "%s,W%sW%d,W%d:%.2d\n", <_w_e_e_k_d_a_y>, <_m_o_n_t_h>, <_d_a_y>, <_h_o_u_r>, <_m_i_n> Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.12 File Format Notation 203 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX To show J written to 5 decimal places: "piW=W%.5f\n", <_v_a_l_u_e _o_f J> To show an input file format consisting of five colon-separated fields: "%s:%s:%s:%s:%s\n", <_a_r_g_1>, <_a_r_g_2>, <_a_r_g_3>, <_a_r_g_4>, <_a_r_g_5> END_RATIONALE 2.13 Configuration Values 2.13.1 Symbolic Limits This clause lists magnitude limitations imposed by a specific implementation. The braces notation, {LIMIT}, is used in this standard to indicate these values, but the braces are not part of the name. The values specified in Table 2-16 represent the lowest values conforming implementations shall provide; and consequently, the largest values on which an application can rely without further enquiries, as described below. These values shall be accessible to applications via the getconf utility (see 4.26) and through the interfaces described in 7.8.2, [such as _s_y_s_c_o_n_f() in the C binding]. The literal names shown in the table apply only to the getconf utility; the high-level-language binding shall describe the exact form of each name to be used by the interfaces in that binding. Implementations may provide more liberal, or less restrictive, values than shown in Table 2-16. These possibly more liberal values are accessible using the symbols in Table 2-17. The functions in 7.8.2 [such as _s_y_s_c_o_n_f() in the C binding] or the getconf utility shall return the value of each symbol on each specific implementation. The value so retrieved shall be the largest, or most liberal, value that shall be available throughout the session lifetime, as determined at session creation. The literal names shown in the table apply only to the getconf utility; the high-level-language binding shall describe the exact form of each name to be used by the interfaces in that binding. All numerical limits defined by POSIX.1 {8}, such as {PATH_MAX}, also apply to this standard. (See POSIX.1 {8} 2.8.) All the utilities defined by this standard are implicitly limited by these values, unless otherwise noted in the utility descriptions. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 204 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 2-16 - Utility Limit Minimum Values __________________________________________________________________________________________________________________________________________________ Name Description Value ____________________________________________________________________ {POSIX2_BC_BASE_MAX} The maximum _o_b_a_s_e value 99 allowed by the bc utility. {POSIX2_BC_DIM_MAX} The maximum number of elements 2048 permitted in an array by the bc utility. {POSIX2_BC_SCALE_MAX} The maximum _s_c_a_l_e value 99 allowed by the bc utility. {POSIX2_BC_STRING_MAX} The maximum length of a string 1000 constant accepted by the bc utility. {POSIX2_COLL_WEIGHTS_MAX} The maximum number of weights 2 that can be assigned to an entry of the LC_COLLATE order keyword in the locale definition file; see 2.5.2.2.3. {POSIX2_EXPR_NEST_MAX} The maximum number of 32 expressions that can be nested within parentheses by the expr utility. {POSIX2_LINE_MAX} Unless otherwise noted, the 2048 maximum length, in bytes, of a utility's input line (either standard input or another file), when the utility is described as processing text files. The length includes room for the trailing . {POSIX2_RE_DUP_MAX} The maximum number of repeated 255 occurrences of a regular expression permitted when using the interval notation \{_m,_n\}; see 2.8.3.3. {POSIX2_VERSION} This value indicates the 199??? 11 version of the utilities in 1 this standard that are 1 provided by the 1 implementation. It will 1 change with each published 1 version of this standard. 1 __________________________________________________________________________________________________________________________________________________ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.13 Configuration Values 205 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 2-17 - Symbolic Utility Limits __________________________________________________________________________________________________________________________________________________ Minimum Name Description Value ____________________________________________________________________ {BC_BASE_MAX} The maximum _o_b_a_s_e value {POSIX2_BC_BASE_MAX} allowed by the bc utility. {BC_DIM_MAX} The maximum number of {POSIX2_BC_DIM_MAX} elements permitted in an array by the bc utility. {BC_SCALE_MAX} The maximum _s_c_a_l_e value {POSIX2_BC_SCALE_MAX} allowed by the bc utility. {BC_STRING_MAX} The maximum length of a {POSIX2_BC_STRING_MAX} string constant accepted by the bc utility. {COLL_WEIGHTS_MAX} The maximum number of {POSIX2_COLL_WEIGHTS_MAX} weights that can be assigned to an entry of the LC_COLLATE order keyword in the locale definition file; see 2.5.2.2.3. {EXPR_NEST_MAX} The maximum number of {POSIX2_EXPR_NEST_MAX} expressions that can be nested within parentheses by the expr utility. {LINE_MAX} Unless otherwise noted, {POSIX2_LINE_MAX} the maximum length, in bytes, of a utility's input line (either standard input or another file), when the utility is described as processing text files. The length includes room for the trailing . The maximum number of repeated occurrences of a regular expression permitted when using the interval notation \{_m,_n\}; see 2.8.3.3. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 206 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 {RE_DUP_MAX} {POSIX2_RE_DUP_MAX} __________________________________________________________________________________________________________________________________________________ It is not guaranteed that the application can in fact push a value to the implementation's specified limit in any given case, or at all, as a lack of virtual memory or other resources may prevent this. The limit value indicates only that the implementation does not specifically impose any arbitrary, more restrictive limit. BEGIN_RATIONALE 2.13.1.1 Symbolic Limits Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This clause grew out of an idea that originated in POSIX.1 {8}, in the form of _s_y_s_c_o_n_f() and _p_a_t_h_c_o_n_f(). (In fact, the same person wrote the original text for both standards.) The idea is that a Strictly Conforming POSIX.2 Application can be written to use the most restrictive values that a minimal system can provide, but it shouldn't have to. The values shown in Table 2-17 represent compromises so that some vendors can use historically-limited versions of UNIX system utilities. They are the highest values that Strictly Conforming POSIX.2 Applications or Conforming POSIX.2 Applications can assume, given no other information. However, by using getconf or _s_y_s_c_o_n_f(), the elegant application can tailor itself to the more liberal values on some of the specific instances of specific implementations. There is no explicitly-stated requirement that an implementation provide finite limits for any of these numeric values; the implementation is free to provide essentially unbounded capabilities (where it makes sense), stopping only at reasonable points such as {ULONG_MAX} (from the C Standard {7} via POSIX.1 {8}). Therefore, applications desiring to tailor themselves to the values on a particular implementation need to be ready for possibly huge values; it may not be a good idea to blindly allocate a buffer for an input line based on the value of {LINE_MAX}, for instance. However, unlike POSIX.1 {8}, there is no set of limits in this standard that return a special indication meaning ``unbounded.'' The implementation should always return an actual number, even if the number is very large. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.13 Configuration Values 207 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The statement ``It is not guaranteed that the application ... is an indication that many of these limits are designed to ensure that implementors design their utilities without arbitrary constraints related to unimaginative programming. There are certainly conditions under which combinations of options can cause failures that would not render an implementation nonconforming. For example, {EXPR_NEST_MAX} and {ARG_MAX} could collide when expressions are large; combinations of {BC_SCALE_MAX} and {BC_DIM_MAX} could exceed virtual memory. In POSIX.2, the notion of a limit being guaranteed for the process lifetime, as it is in POSIX.1 {8}, is not as useful to a shell script. The getconf utility is probably a process itself, so the guarantee would be valueless. Therefore, POSIX.2 requires the guarantee to be for the session lifetime. This will mean that many vendors will either return very conservative values or possibly implement getconf as a built-in. It may seem confusing to have limits that apply only to a single utility grouped into one global clause. However, the alternative, which would be to disperse them out into their utility description clauses, would cause great difficulty when _s_y_s_c_o_n_f() and getconf were described. Therefore, the working group chose the global approach. Each language binding could provide symbol names that are slightly different than are shown here. For example, the C binding prefixes the symbols with a leading underscore. The following comments describe selection criteria for the symbols and their values. {ARG_MAX} This is defined by POSIX.1 {8}. Unfortunately, it is very difficult for a portable application to deal with this value, as it does not know how much of its argument space is being consumed by the user's environment variables. {BC_BASE_MAX} {BC_DIM_MAX} {BC_SCALE_MAX} These were originally one value, {BC_SCALE_MAX}, but it was unreasonable to link all three concepts into one limit. {CHILD_MAX} This is defined by POSIX.1 {8}. {CUT_FIELD_MAX} This value was removed from an earlier draft. It represented Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 208 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 the maximum length of the _l_i_s_t argument to the cut -c or -f options. Since the length is now unspecified, the utility should have to deal with arbitrarily long lists, as long as {ARG_MAX} is not exceeded. {CUT_LINE_MAX} This value was removed from an earlier draft. Historical cuts have had input line limits of 1024; this removal therefore mandates that a conforming cut shall process files with lines of 1 unlimited length. 1 {DEPTH_MAX} This directory-traversing depth limit (which at one time applied to rm and find) was removed from an earlier draft for two major reasons: (1) It could be a security problem if utilities searching for files could not descend below a published depth; this would be a semi-reliable means of hiding files from the administrator. (2) There is no reason a reasonable implementation should have to limit itself in this way. {ED_FILE_MAX} This value was removed from an earlier draft. Historical eds have had very small file limits; since {ED_FILE_MAX} is no longer specified, implementations have to document the limits as described in 2.11. It is recommended that implementations set much more reasonable file size limits as they modify ed to deal with other features required by POSIX.2. {ED_LINE_MAX} This value was removed from an earlier draft. Historical eds have had small input line limits; this removal therefore mandates that a conforming ed shall process files with lines of length {LINE_MAX}. {COLL_WEIGHTS_MAX} The weights assigned to order can be considered as ``passes'' through the collation algorithm. {EXPR_NEST_MAX} The value for expression nesting was borrowed from the C Standard {7}. {FIND_DEPTH_MAX} This was removed from an earlier draft in favor of a common Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.13 Configuration Values 209 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX value, {DEPTH_MAX}. {FIND_FILESYS_MAX} This was removed from an earlier draft. It indicated the limit of the number of file systems that find could traverse in its search. It was dropped because this standard does not really acknowledge the historical nature of separate file systems. {FIND_NEWER_MAX} This value, which allowed find to limit the number of -newer operands it processed, was deleted from an earlier draft. It was felt to be a vestige of a particular implementation with an incorrect programming algorithm that should not limit applications. {JOIN_LINE_MAX} This value was removed from an earlier draft. Historical joins have had input line limits of 1024; this removal therefore mandates that a conforming join shall process files with lines of length {LINE_MAX}. {LINE_MAX} This is a global limit that affects all utilities, unless otherwise noted. The {MAX_CANON} value from POSIX.1 {8} may further limit input lines from terminals. The {LINE_MAX} value was the subject of much debate and is a compromise between those who wished unlimited lines and those who understood that many historical utilities were written with fixed buffers. Frequently, utility writers selected the UNIX system constant BUFSIZ to allocate these buffers; therefore, some utilities were limited to 512 bytes for I/O lines, while others achieved 4096 or greater. It should be noted that {LINE_MAX} applies only to input line length; there is no requirement in the standard that limits the length of output lines. Utilities such as awk, sed, and paste could theoretically construct lines longer than any of the input lines they received, depending on the options used or the instructions from the application. They are not required to truncate their output to {LINE_MAX}. It is the responsibility of the application to deal with this. If the output of one of those utilities is to be piped into another of the standard utilities, line lengths restrictions will have to be considered; the fold utility, among others, could be used to ensure that only reasonable line lengths reach utilities or applications. {LINK_MAX} This is defined by POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 210 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 {LP_LINE_MAX} This value was removed from an earlier draft. Since so little is being required for the details of the lp utility, it made little sense to specify how long its output lines are. Thus, implementations of lp will be expected to deal with lines up to {LINE_MAX}, but whether those lines print sensibly on every device is unspecified. {MAX_CANON} This is defined by POSIX.1 {8}. {MAX_INPUT} This is defined by POSIX.1 {8}. {NAME_MAX} This is defined by POSIX.1 {8}. {NGROUPS_MAX} This is defined by POSIX.1 {8}. {OPEN_MAX} This is defined by POSIX.1 {8}. {PATH_MAX} This is defined by POSIX.1 {8}. {PIPE_BUF} This is defined by POSIX.1 {8}. {RM_DEPTH_MAX} This was removed from an earlier draft in favor of a common value, {DEPTH_MAX}. {RE_DUP_MAX} The value selected is consistent with historical practice. {SED_PATTERN_MAX} This symbolic value, the size of the sed pattern space, was replaced by a specific value in the sed description. It is unlikely that any real application would ever need to access this value symbolically. {SORT_LINE_MAX} This was removed from an earlier draft. Now that cut and fold can handle unlimited-length input lines, a special long input line limit for sort is not needed. There are different limits associated with command lines and input to utilities, depending on the method of invocation. In the case of a C Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.13 Configuration Values 211 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX program _e_x_e_c-ing a utility, {ARG_MAX} is the underlying limit. In the case of the shell reading a script and _e_x_e_c-ing a utility, {LINE_MAX} limits the length of lines the shell is required to process and {ARG_MAX} will still be a limit. If a user is entering a command on a terminal to the shell, requesting that it invoke the utility, {MAX_INPUT} may restrict the length of the line that can be given to the shell to a value below {LINE_MAX}. END_RATIONALE 2.13.2 Symbolic Constants for Portability Specifications Table 2-18 - Optional Facility Configuration Values __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ {POSIX2_C_BIND} The C language development facilities in Annex A support the C Language Bindings Option (see Annex B). {POSIX2_C_DEV} The system supports the C Language Development Utilities Option (see Annex A). {POSIX2_FORT_DEV} The system supports the FORTRAN Development Utilities Option (see Annex C). {POSIX2_FORT_RUN} The system supports the FORTRAN Runtime Utilities Option (see Annex C). {POSIX2_LOCALEDEF} The system supports the creation of locales as described in 4.35. {POSIX2_SW_DEV} The system supports the Software Development Utilities Option (see Section 6). __________________________________________________________________________________________________________________________________________________ Table 2-18 lists symbols that can be used by the application to determine which optional facilities are present on the implementation. The functions defined in 7.8.2 [such as _s_y_s_c_o_n_f()] or the getconf utility can be used to retrieve the value of each symbol on each specific implementation. The literal names shown in the table apply only to the getconf utility; the high-level-language binding shall describe the exact form of each name to be used by the interfaces in that binding. Each of these symbols shall be considered valid names by the implementation. Each shall be defined on the system with a value of 1 if the corresponding option is supported; otherwise, the symbol shall be undefined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 212 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 BEGIN_RATIONALE 2.13.2.1 Symbolic Constants for Portability Specifications Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) When an option is supported, getconf returns a value of 1. For example, when C development is supported: if [ "$(getconf POSIX2_C_DEV)" -eq 1 ]; then echo C supported fi The _s_y_s_c_o_n_f() function in the C binding would return 1. The following comments describe selection criteria for the symbols and their values. {POSIX2_C_BIND} {POSIX2_C_DEV} {POSIX2_FORT_DEV} {POSIX2_SW_DEV} These were renamed from _POSIX_* in Draft 9 after it was pointed out that each of the POSIX standards should keep generally in its own namespace. It is possible for some (usually privileged) operations to remove utilities that support these options, or otherwise render these options unsupported. The header files, the _s_y_s_c_o_n_f() function, or the getconf utility will not necessarily detect such actions, in which case they should not be considered as rendering the implementation nonconforming. A test suite should not attempt tests like: rm /usr/bin/c89 getconf POSIX2_C_DEV {_POSIX_LOCALEDEF} This symbol was introduced to allow implementations to restrict supported locales to only those supplied by the implementation. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.13 Configuration Values 213 P1003.2/D11.2 Section 3: Shell Command Language The shell is a command language interpreter. This section describes the syntax of that command language as it is used by the sh utility and the functions in 7.1 [such as _s_y_s_t_e_m() and _p_o_p_e_n() in the C binding]. The shell operates according to the following general overview of operations. The specific details are included in the cited clauses and subclauses of this section. The shell: (1) Reads its input from a file (see sh in 4.56), from the -c option, or from one of the functions in 7.1. If the first line of a file of shell commands starts with the characters #!, the results are unspecified. (2) Breaks the input into tokens: words and operators. (See 3.3.) (3) Parses the input into simple (3.9.1) and compound (3.9.4) commands. (4) Performs various expansions (separately) on different parts of each command, resulting in a list of pathnames and fields to be treated as a command and arguments (3.6). (5) Performs redirection (3.7) and removes redirection operators and their operands from the parameter list. (6) Executes a function (3.9.5), built-in (3.14), executable file, or script, giving the name of the command (or, in the case of a 1 function within a script, the name of the script) as the 1 ``zero'th'' argument and the remaining words and fields as parameters (3.9.1.1). (7) Optionally waits for the command to complete and collects the exit status (3.8.2). BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3 Shell Command Language 215 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.0.1 Shell Command Language Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The System V shell was selected as the starting point for this standard. The BSD C-shell was excluded from consideration, for the following reasons: (1) Most historically portable shell scripts assume the Version 7 ``Bourne'' shell, from which the System V shell is derived. (2) The majority of tutorial materials on shell programming assume the System V shell. Despite the selection of the System V shell, the developers of the standard did not limit the possibilities for a shell command language that was upward-compatible. The only programmatic interfaces to the shell language are through the functions in 7.1 and the sh utility. Most implementations provide an interface to, and processing mode for, the shell that is suitable for direct user interaction. The behavior of this interactive mode is not defined by this standard; however, places where historically an interactive shell behaves differently from the behavior described here are noted. (1) Aliases are not included in the base POSIX.2 because they duplicate functionality already available to applications with functions. In early drafts, the search order of simple command lookup was ``aliases, built-ins, functions, file system,'' and therefore an alias was necessary to create a user-defined command having the same name as a built-in. To retain this capability, the search order has changed to ``special built-ins, functions, built-ins, file system,'' and a built-in, called command, has been added, which disables the looking up of functions. Aliases are a part of the POSIX.2a UPE because they are widely used by human users, as differentiated from applications. (2) All references to job control and related commands have been omitted from the base POSIX.2. POSIX.2 describes the noninteractive operation of the shell; job control is outside the scope of this standard until the UPE revision is developed. Apparently it is not widely known that traditionally, even in a job control environment, the commands executed during the execution of a shell script are not placed into separate process groups. If they were, one could not stop the execution of the shell script from the interactive shell, for example. This standard does not require or prohibit job control; it simply does not mention it. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 216 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 (3) The conditional command (double bracket [[ ]]) was removed from an earlier draft. Objections were lodged that the real problem is misuse of the test command ([), and putting it into the shell is the wrong way to fix the problem. Instead, proper documentation and a new shell reserved word (!) are sufficient. Tests that require multiple test operations can be done at the shell level using individual invocations of the test command and shell logicals, rather than the error prone -o flag of test. (4) Exportable functions were removed from an earlier draft. See the rationale in 3.9.5.1. The construct #! is reserved for implementations wishing to provide that extension. If it were not reserved, the standard would disallow it by forcing it to be a comment. As it stands, a conforming application shall not use #! as the first line of a shell script. END_RATIONALE 3.1 Shell Definitions The following terms are used in Section 3. Because they are specific to the shell, they do not appear in 2.2.2. 3.1.1 control operator: A token that performs a control function. It is one of the following symbols: & ) && ; | ( ;; || The end-of-input indicator used internally by the shell is also considered a control operator. See 3.3. On some systems, the symbol (( is a control operator; its use produces 1 unspecified results. 3.1.2 expand: When not qualified, the act of applying all the expansions described in 3.6. 3.1.3 field: A unit of text that is the result of parameter expansion (3.6.2), arithmetic expansion (3.6.4), command substitution (3.6.3), or field splitting (3.6.5). During command processing (see 3.9.1), the resulting fields are used as the command name and its arguments. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.1 Shell Definitions 217 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.1.4 interactive shell: A processing mode of the shell that is suitable for direct user interaction. The behavior in this mode is not defined by this standard. NOTE: The preceding sentence is expected to change following the eventual approval of the UPE supplement. 3.1.5 name: A word consisting solely of underscores, digits, and alphabetics from the portable character set (see 2.4). The first character of a name shall not be a digit. 3.1.6 operator: Either a control operator or a redirection operator. 3.1.7 parameter: An entity that stores values. There are three types of parameters: variables (named parameters), positional parameters, and special parameters. Parameter expansion is accomplished by introducing a parameter with the $ character. See 3.5. 3.1.8 positional parameter: A parameter denoted by a single digit or one or more digits in curly braces. See 3.5.1. 3.1.9 redirection: A method of associating files with the input/output of commands. See 3.7. 3.1.10 redirection operator: A token that performs a redirection function. It is one of the following symbols: < > >| << >> <& >& <<- <> 3.1.11 special parameter: A parameter named by a single character from the following list: * @ # ? ! - $ 0 See 3.5.2. 3.1.12 subshell: A shell execution environment, distinguished from the main or current shell execution environment by the attributes described in 3.12. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 218 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.1.13 token: A sequence of characters that the shell considers as a single unit when reading input, according to the rules in 3.3. A token is either an operator or a word. 3.1.14 variable: A named parameter. See 3.5. 3.1.15 variable assignment [assignment]: A word consisting of the following parts _v_a_r_n_a_m_e=_v_a_l_u_e When used in a context where assignment is defined to occur (see 3.9.1) and at no other time, the _v_a_l_u_e (representing a word or field) shall be assigned as the value of the variable denoted by _v_a_r_n_a_m_e. The _v_a_r_n_a_m_e and _v_a_l_u_e parts meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign in addition to the delimiting described in 3.3. In all cases, the variable shall be created if it did not already exist. If _v_a_l_u_e is not specified, the variable shall be given a null value. An alternative form of variable assignment: _s_y_m_b_o_l=_v_a_l_u_e (where _s_y_m_b_o_l is a valid word delimited by an equals-sign, but not a valid name) produces unspecified results. 3.1.16 word: A token other than an operator. In some cases a word is also a portion of a word token: in the various forms of parameter expansion (3.6.2), such as ${_n_a_m_e-_w_o_r_d}, and variable assignment, such as _n_a_m_e=_w_o_r_d, the word is the portion of the token depicted by _w_o_r_d. The concept of a word is no longer applicable following word expansions--only fields remain; see 3.6. BEGIN_RATIONALE 3.1.17 Shell Definitions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The _w_o_r_d=_w_o_r_d form of variable assignment was included, producing unspecified results, to allow the KornShell _n_a_m_e[_e_x_p_r_e_s_s_i_o_n]=_v_a_l_u_e syntax to conform. The (( symbol is a control operator in the KornShell, used for an 1 alternative syntax of an arithmetic expression command. A strictly conforming POSIX.2 application cannot use (( as a single token [with the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.1 Shell Definitions 219 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX obvious exception of the $(( form described in POSIX.2]. The decision to require this is based solely on the pragmatic knowledge that there are many more historical shell scripts using the KornShell syntax than there might be using nested subshells, such as ((foo)) or ((foo);(bar)) The latter example should not be misinterpreted by the shell as arithmetic because attempts to balance the parentheses pairs would indicate that they are subshells. Thus, in most cases, while a few scripts will no longer be strictly portable, the chances of breaking existing scripts is even smaller. There are no explicit limits in this standard on the sizes of names, 1 words, lines, or other objects. However, other implicit limits do apply: 1 shell script lines produced by many of the standard utilities cannot 1 exceed {LINE_MAX} and the sum of exported variables comes under the 1 {ARG_MAX} limit. Historical shells dynamically allocate memory for names 1 and words and parse incoming lines a byte at a time. Lines cannot have 1 an arbitrary {LINE_MAX} limit because of historical practice such as 1 makefiles, where make removes the s associated with the commands 1 for a target and presents the shell with one very long line. The text in 1 2.11.5.2 does allow a shell to run out of memory, but it cannot have arbitrary programming limits. END_RATIONALE 3.2 Quoting Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragraph; prevent reserved words from being recognized as such; and prevent parameter expansion and command substitution within here-document processing (see 3.7.4). The following characters shall be quoted if they are to represent themselves: | & ; < > ( ) $ ` \ " ' and the following may need to be quoted under certain circumstances. That is, these characters may be special depending on conditions described elsewhere in the standard: * ? [ # ~ = % Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 220 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 The various quoting mechanisms are the escape character, single-quotes, and double-quotes. The here-document represents another form of quoting; see 3.7.4. 3.2.1 Escape Character (Backslash) A backslash that is not quoted shall preserve the literal value of the following character, with the exception of a . If a follows the backslash, the shell shall interpret this as line continuation. The backslash and shall be removed before splitting the input into tokens. 3.2.2 Single-Quotes Enclosing characters in single-quotes (' ') shall preserve the literal value of each character within the single-quotes. A single-quote cannot occur within single-quotes. 3.2.3 Double-Quotes Enclosing characters in double-quotes (" ") shall preserve the literal value of all characters within the double-quotes, with the exception of the characters dollar-sign, backquote, and backslash, as follows: $ The dollar-sign shall retain its special meaning introducing parameter expansion (see 3.6.2), a form of command substitution (see 3.6.3), and arithmetic expansion (see 3.6.4). The input characters within the quoted string that are also enclosed between $( and the matching ) shall not be affected by the double-quotes, but rather shall define that command whose output replaces the $(...) when the word is expanded. The tokenizing rules in 3.3 shall be applied recursively to find the matching ). Within the string of characters from an enclosed ${ to the matching }, an even number of unescaped double-quotes or single-quotes, if any, shall occur. A preceding backslash character shall be used to escape a literal { or }. The rule in 3.6.2 shall be used to determine the matching }. ` The backquote shall retain its special meaning introducing the other form of command substitution (see 3.6.3). The portion of the quoted string from the initial backquote and the characters up to the next backquote that is not preceded by a backslash, having escape characters removed, defines that command whose Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.2 Quoting 221 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX output replaces `...` when the word is expanded. Either of the following cases produces undefined results: - A single- or double-quoted string that begins, but does not end, within the `...` sequence. - A `...` sequence that begins, but does not end, within the same double-quoted string. \ The backslash shall retain its special meaning as an escape character (see 3.2.1) only when followed by one of the characters: $ ` " \ A double-quote shall be preceded by a backslash to be included within double-quotes. The parameter @ has special meaning inside double-quotes and is described in 3.5.2. BEGIN_RATIONALE 3.2.4 Quotes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) A backslash cannot be used to escape a single-quote in a single-quoted string. An embedded quote can be created by writing, for example, 'a'\''b', which yields a'b. (See 3.6.5 for a better understanding of how portions of words are either split into fields or remain concatenated.) A single token can be made up of concatenated partial strings containing all three kinds of quoting/escaping, thus permitting any combination of characters. The escaped used for line continuation is removed entirely from the input and is not replaced by any white space. Therefore, it cannot serve as a token separator. In double-quoting, if a backslash is immediately followed by a character that would be interpreted as having a special meaning, the backslash is deleted and the subsequent character is taken literally. If a backslash does not precede a character that would have a special meaning, it is left in place unmodified and the character immediately following it is also left unmodified. Thus, for example: "\$" => $ "\a" => \a It would be desirable to include the statement ``The characters from an enclosed ${ to the matching } shall not be affected by the double- Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 222 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 quotes,'' similar to the one for $( ). However, historical practice in the System V shell prevents this. The requirement that double-quotes be matched inside ${...} within double-quotes and the rule for finding the matching } in 3.6.2 eliminate several subtle inconsistencies in expansion for historical shells in rare cases; for example, "${foo-bar"} yields bar when foo is not defined, and is an invalid substitution when 1 foo is defined, in many historical shells. The differences in processing the "${...}" form have led to inconsistencies between the historical System V, BSD, and KornShells, and the text in POSIX.2 is an attempt to converge them without breaking many applications. A consequence of the new rule is that single-quotes cannot be used to quote the } within "${...}"; for example unset bar foo="${bar-'}'}" is invalid because the "${...}" substitution contains an unpaired 1 unescaped single-quote. The backslash can be used to escape the } in 1 this example to achieve the desired result: unset bar foo="${bar-\}}" The only alternative to this compromise between shells would be to make the behavior unspecified whenever the literal characters ', {, }, and " appear within ${...}. To write a portable script that uses these values, a user would have to assign variables, say, squote=\' dquote=\" lbrace='{' rbrace='}' ${foo-$squote$rbrace$squote} rather than ${foo-"'}'"} Some systems have allowed the end of the word to terminate the backquoted command substitution, such as in "`echo hello" This usage is undefined in POSIX.2, where the matching backquote is required. The other undefined usage can be illustrated by the example: sh -c '` echo "foo`' Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.2 Quoting 223 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The description of the recursive actions involving command substitution can be illustrated with an example. Upon recognizing the introduction of command substitution, the shell must parse input (in a new context), gathering the ``source'' for the command substitution until an unbalanced ) or ` is located. For example, in the following echo "$(date; echo " one" )" the double-quote following the echo does not terminate the first double- quote; it is part of the command substitution ``script.'' Similarly, in echo "$(echo *)" the asterisk is not quoted since it is inside command substitution; however, echo "$(echo "*")" is quoted (and represents the asterisk character itself). END_RATIONALE 3.3 Token Recognition The shell reads its input in terms of lines from a file, from a terminal in the case of an interactive shell, or from a string in the case of sh -c or _s_y_s_t_e_m(). The input lines can be of unlimited length. These 1 lines are parsed using two major modes: ordinary token recognition and 1 processing of here-documents. When an io_here token has been recognized by the grammar (see 3.10), one or more of the immediately subsequent lines form the body of one or more here-documents and shall be parsed according to the rules of 3.7.4. When it is not processing an io_here, the shell shall break its input 1 into tokens by applying the first applicable rule below to the next character in its input. The token shall be from the current position in the input until a token is delimited according to one of the rules below; the characters forming the token are exactly those in the input, including any quoting characters. If it is indicated that a token is delimited, and no characters have been included in a token, processing shall continue until an actual token is delimited. (1) If the end of input is recognized, the current token shall be delimited. If there is no current token, the end-of-input indicator shall be returned as the token. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 224 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 (2) If the previous character was used as part of an operator and the current character is not quoted and can be used with the current characters to form an operator, it shall be used as part of that (operator) token. (3) If the previous character was used as part of an operator and the current character cannot be used with the current characters to form an operator, the operator containing the previous character shall be delimited. (4) If the current character is backslash, single-quote, or double- quote (\, ', or ") and it is not quoted, it shall affect quoting for subsequent character(s) up to the end of the quoted text. The rules for quoting are as described in 3.2. During token recognition no substitutions shall be actually performed, and the result token shall contain exactly the characters that appear in the input (except for joining), unmodified, including any embedded or enclosing quotes or substitution operators, between the quote mark and the end of the quoted text. The token shall not be delimited by the end of the quoted field. (5) If the current character is an unquoted $ or `, the shell shall identify the start of any candidates for parameter expansion (3.6.2), command substitution (3.6.3), or arithmetic expansion (3.6.4) from their introductory unquoted character sequences: $ or ${, $( or `, and $((, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited subclauses). While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution. (6) If the current character is not quoted and can be used as the first character of a new operator, the current token (if any) shall be delimited. The current character shall be used as the beginning of the next (operator) token. (7) If the current character is an unquoted , the current token shall be delimited. (8) If the current character is an unquoted , any token containing the previous character is delimited and the current Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.3 Token Recognition 225 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX character is discarded. (9) If the previous character was part of a word, the current character is appended to that word. (10) If the current character is a #, it and all subsequent characters up to, but excluding, the next are discarded as a comment. The that ends the line is not considered part of the comment. (11) The current character is used as the start of a new word. Once a token is delimited, it shall be categorized as required by the grammar in 3.10. BEGIN_RATIONALE 3.3.1 Token Recognition Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The (3) rule about combining characters to form operators is not meant to 1 preclude systems from extending the shell language when characters are 1 combined in otherwise invalid ways. Portable applications cannot use 1 invalid combinations and test suites should not penalize systems that 1 take advantage of this fact. For example, the unquoted combination |& is 1 not valid in a POSIX.2 script, but has a specific KornShell meaning. 1 The (10) rule about # as the current character is the first in the sequence in which a new token is being assembled. The # starts a comment only when it is at the beginning of a token. This rule is also written to indicate that the search for the end-of-comment does not consider escaped specially, so that a comment cannot be continued to the next line. END_RATIONALE 3.4 Reserved Words Reserved words are words that have special meaning to the shell. (See 3.9.) The following words shall be recognized as reserved words: ! elif fi in while case else for then {4) do esac if until } done Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 226 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 This recognition shall occur only when none of the characters are quoted and when the word is used as: (1) The first word of a command (2) The first word following one of the reserved words other than case, for, or in (3) The third word in a case or for command (only in is valid in this case) See the grammar in 3.10. The following words may be recognized as reserved words on some systems (when none of the characters are quoted), causing unspecified results: function select [[ ]] 2 Words that are the concatenation of a name and a colon (:) are reserved; their use produces unspecified results. BEGIN_RATIONALE 3.4.1 Reserved Words Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) All reserved words are recognized syntactically as such in the contexts described. However, it is useful to point out that in is the only meaningful reserved word after a case or for; similarly, in is not meaningful as the first word of a simple command. Reserved words are recognized only when they are delimited (i.e., meet the definition of _w_o_r_d; see 3.1.16), whereas operators are themselves delimiters. For instance, ( and ) are control operators, so that no is needed in (list). However, { and } are reserved words in { list;}, so that in this case the leading and semicolon are required. __________ 4) In some historical systems, the curly braces are treated as control operators. To assist in future standardization activities, portable applications should avoid using unquoted braces to represent the characters themselves. It is possible that a future version of POSIX.2 may require this, although probably not for the often-used find {} construct. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.4 Reserved Words 227 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The list of unspecified reserved words is from the KornShell, so portable applications cannot use them in places a reserved word would be recognized. This list contained time in earlier drafts, but it was 2 removed when the time utility was selected for the UPE. 2 There was a strong argument for promoting braces to operators (instead of reserved words), so they would be syntactically equivalent to subshell operators. Concerns about compatibility outweighed the advantages of this approach. Nevertheless, portable applications should consider quoting { and } when they represent themselves. The restriction on ending a name with a colon is to allow future implementations that support named labels for flow control. See the rationale for break (3.14.1.1). END_RATIONALE 3.5 Parameters and Variables A parameter can be denoted by a name, a number, or one of the special characters listed in 3.5.2. A variable is a parameter denoted by a name. A parameter is set if it has an assigned value (null is a valid value). Once a variable is set, it can only be unset by using the unset special built-in command. 3.5.1 Positional Parameters A positional parameter is a parameter denoted by the decimal value represented by one or more digits, other than the single digit 0. When a positional parameter with more than one digit is specified, the application shall enclose the digits in braces (see 3.6.2). Positional parameters are initially assigned when the shell is invoked (see sh in 4.56), temporarily replaced when a shell function is invoked (see 3.9.5), and can be reassigned with the set special built-in command. BEGIN_RATIONALE 3.5.1.1 Positional Parameters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The digits denoting the positional parameters are always interpreted as a decimal value, even if there is a leading zero. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 228 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.5.2 Special Parameters Listed below are the special parameters and the values to which they shall expand. Only the values of the special parameters are listed; see 3.6 for a detailed summary of all the stages involved in expanding words. * Expands to the positional parameters, starting from one. When the expansion occurs within a double-quoted string (see 3.2.3), it expands to a single field with the value of each parameter separated by the first character of the IFS variable, or by a if IFS is unset. @ Expands to the positional parameters, starting from one. When the expansion occurs within double-quotes, each positional parameter expands as a separate field, with the provision that the expansion of the first parameter is still joined with the beginning part of the original word (assuming that the expanded parameter was embedded within a word), and the expansion of the last parameter is still joined with the last part of the original word. If there are no positional parameters, the 1 expansion of @ shall generate zero fields, even when @ is 1 double-quoted. 1 # Expands to the decimal number of positional parameters. ? Expands to the decimal exit status of the most recent pipeline (see 3.9.2). - (Hyphen) Expands to the current option flags (the single-letter option names concatenated into a string) as specified on invocation, by the set special built-in command, or implicitly by the shell. $ Expands to the decimal process ID of the invoked shell. In a subshell (see 3.12), $ shall expand to the same value as that of the current shell. ! Expands to the decimal process ID of the most recent background command (see 3.9.3) executed from the current shell. For a 1 pipeline, the process ID is that of the last command in the pipeline. 0 (Zero.) Expands to the name of the shell or shell script. See sh (4.56) for a detailed description of how this name is derived. See the description of the IFS variable in 3.5.3. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.5 Parameters and Variables 229 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX BEGIN_RATIONALE 3.5.2.1 Special Parameters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Most historical implementations implement subshells by forking; thus, the special parameter $ does not necessarily represent the process ID of the shell process executing the commands since the subshell execution environment preserves the value of $. If a subshell were to execute a background command, the value of its 1 parent's $! would not change. For example: 1 ( 1 date & 1 echo $! 1 ) 1 echo $! 1 would echo two different values for $!. 1 The descriptions of parameters * and @ assume the reader is familiar with the field splitting discussion in 3.6.5 and understands that portions of the word will remain concatenated unless there is some reason to split them into separate fields. Some examples of the * and @ properties, including the concatenation aspects: set "abc" "def ghi" "jkl" echo $* => "abc" "def" "ghi" "jkl" echo "$*" => "abc def ghi jkl" echo $@ => "abc" "def" "ghi" "jkl" _b_u_t echo "$@" => "abc" "def ghi" "jkl" echo "xx$@yy" => "xxabc" "def ghi" "jklyy" echo "$@$@" => "abc" "def ghi" "jklabc" "def ghi" "jkl" In the preceding examples, the double-quote characters that appear after the => do not appear in the output and are used only to illustrate word boundaries. Historical versions of the Bourne shell have used as a separator between the expanded members of "$*". The KornShell has used the first character in IFS, which is by default. If IFS is set to a null 1 string, this is not equivalent to unsetting it; its first character will 1 not exist, so the parameter values are concatenated. For example: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 230 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 $ IFS='' 1 $ set foo bar bam 1 $ echo "$@" 1 foo bar bam 1 $ echo "$*" 1 foobarbam 1 $ unset IFS 1 $ echo "$*" 1 foo bar bam 1 The $- can be used to save and restore set options: Save=$(echo $- | sed 's/[ics]//g') 1 ... set +aCefnuvx 2 set -$Save The three options are removed using sed in the example because they may 1 appear in the value of $- (from the sh command line), but are not valid 1 options to set. 1 The command name (parameter 0) is not counted in the number given by # because it is a special parameter, not a positional parameter. END_RATIONALE 3.5.3 Variables Variables shall be initialized from the environment (as defined by POSIX.1 {8}) and can be given new values with variable assignment commands. If a variable is initialized from the environment, it shall be marked for export immediately; see 3.14.8. New variables can be defined and initialized with variable assignments, with the read or getopts utilities, with the _n_a_m_e parameter in a for loop (see 3.9.4.2), with the ${_n_a_m_e=_w_o_r_d} expansion, or with other mechanisms provided as implementation extensions. The following variables shall affect the execution of the shell: HOME This variable shall be interpreted as the pathname of the user's home directory. The contents of HOME are used in Tilde Expansion (see 3.6.1). IFS _I_n_p_u_t _f_i_e_l_d _s_e_p_a_r_a_t_o_r_s: a string treated as a list of characters that is used for field splitting and to split lines into fields with the read command. If IFS is not set, the shell shall behave as if the value of IFS were the , , and characters. (See 3.6.5.) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.5 Parameters and Variables 231 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LANG This variable shall provide a default value for the LC_* variables, as described in 2.6. LC_ALL This variable shall interact with the LANG and LC_* variables as described in 2.6. LC_COLLATE This variable shall determine the behavior of range expressions, equivalence classes, and multicharacter collating elements within pattern matching. LC_CTYPE This variable shall determine the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters), which characters are defined as letters (character class alpha), and the behavior of character classes within pattern matching. LC_MESSAGES This variable shall determine the language in which messages should be written. PATH This variable represents a string formatted as described in 2.6, used to effect command interpretation. See 3.9.1.1. 1 BEGIN_RATIONALE 3.5.3.1 Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) A description of PWD (which is automatically set by the KornShell whenever the current working directory changes) was omitted because its functionality is easily reproduced using $(pwd). See the discussion of IFS in 3.6.5.1. Other common environment variables used by historical shells are not specified by this standard, but they should be reserved for the historical uses. For interactive use, other shell variables are expected to be introduced by the UPE (and this rationale will be updated accordingly): ENV, FCEDIT, HISTFILE, HISTSIZE, LINENO, PPID, PS1, PS2, PS4. Tilde expansion for components of the PATH in an assignment such as: PATH=~hlj/bin:~dwc/bin:$PATH 1 is a feature of some historical shells and is allowed by the wording of 1 3.6.1. Note that the tildes are expanded during the assignment to PATH, 1 not when PATH is accessed during command search. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 232 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 1 3.6 Word Expansions This clause describes the various expansions that are performed on words. Not all expansions are performed on every word, as explained in the following subclauses. Tilde expansions, parameter expansions, command substitutions, arithmetic expansions, and quote removals that occur within a single word expand to a single field. It is only field splitting or pathname expansion that can create multiple fields from a single word. The single exception to this rule is the expansion of the special parameter @ within double- quotes, as is described in 3.5.2. The order of word expansion shall be as follows: (1) Tilde Expansion (see 3.6.1), Parameter Expansion (see 3.6.2), 1 Command Substitution (see 3.6.3), and Arithmetic Expansion (see 3.6.4) shall be performed, beginning to end. [See item (5) in 3.3.] (2) Field Splitting (see 3.6.5) shall be performed on fields generated by step (1) unless IFS is null. (3) Pathname Expansion (see 3.6.6) shall be performed, unless set -f is in effect. (4) Quote Removal (see 3.6.7) shall always be performed last. The expansions described in this clause shall occur in the same shell environment as that in which the command is executed. If the complete expansion appropriate for a word results in an empty field, that empty field shall be deleted from the list of fields that form the completely expanded command, unless the original word contained 1 single-quote or double-quote characters. 1 The $ character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted $ is followed by a character that is either not numeric, the name of one of the special parameters (see 3.5.2), a valid first character of a variable name, a left curly brace ({), or a left parenthesis, the result is unspecified. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 233 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.6.0.1 Word Expansions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) IFS is used for performing field splitting on the results of parameter and command substitution; it is not used for splitting all fields. Previous versions of the shell used it for splitting all fields during field splitting, but this has severe problems because the shell can no longer parse its own script. There are also important security implications caused by this behavior. All useful applications of IFS use it for parsing input of the read utility and for splitting the results of parameter and command substitution. New versions of the shell have fixed this bug, and POSIX.2 requires the corrected behavior. The rule concerning expansion to a single field requires that if foo=abc and bar=def, that "$foo""$bar" expands to the single field abcdef The rule concerning empty fields can be illustrated by: $ unset foo $ set $foo bar '' xyz "$foo" abc $ for i > do > echo "-$i-" > done -bar- -- -xyz- -- -abc- Step (1) indicates that Tilde Expansion, Parameter Expansion, Command 1 Substitution, and Arithmetic Expansion are all processed simultaneously as they are scanned. For example, the following is valid arithmetic: x=1 echo $(( $(echo 3)+$x )) An earlier draft stated that Tilde Expansion preceded the other steps, 1 but this is not the case in known historical implementations; if it were, 1 and a referenced home directory contained a $ character, expansions would 1 result within the directory name. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 234 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 1 3.6.1 Tilde Expansion A _t_i_l_d_e-_p_r_e_f_i_x consists of an unquoted tilde character at the beginning of a word, followed by all of the characters preceding the first unquoted 2 slash in the word, or all the characters in the word if there is no 2 slash. In an assignment (see 3.1.15), multiple tilde prefixes can be 2 used: at the beginning of the word (i.e., following the equals-sign of 2 the assignment) and/or following any unquoted colon. A tilde prefix in 2 an assignment is terminated by the first unquoted colon or slash. If 2 none of the characters in the tilde-prefix are quoted, the characters in 1 the tilde-prefix following the tilde shall be treated as a possible login 1 name from the user database (see POSIX.1 {8} Section 9). A portable 2 login name cannot contain characters outside the set given in the 2 description of the LOGNAME environment variable in POSIX.1 {8}. If the 2 login name is null (i.e., the tilde-prefix contains only the tilde), the tilde-prefix shall be replaced by the value of the variable HOME. If HOME is unset, the results are unspecified. Otherwise, the tilde-prefix shall be replaced by a pathname of the home directory associated with the login name obtained using the equivalent of the POSIX.1 {8} _g_e_t_p_w_n_a_m() 1 function. If the system does not recognize the login name, the results 1 are undefined. BEGIN_RATIONALE 3.6.1.1 Tilde Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) 2 The text about quoting of the word indicates that \~hlj/, ~h\lj/, 2 ~"hlj"/, ~hlj\/, and ~hlj/ are not equivalent: only the last will cause 2 tilde expansion. 2 Tilde expansion generally occurs only at the beginning of words, but 2 POSIX.2 has adopted an exception based on historical practice in the 2 KornShell: 2 PATH=/posix/bin:~dgk/bin 2 is eligible for tilde expansion because tilde follows a colon and none of 2 the relevant characters is quoted. Consideration was given to 2 prohibiting this behavior because any of the following are reasonable 2 substitutes: 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 235 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX PATH=$(printf %s: rms/bin bfox/bin ...) 2 PATH=$(printf %s ~karels/bi~n : bostic/bin) 2 for Dir in maart~/bin srb/bin ~... 2 do ~ ~ 2 PATH=${PATH:+$PATH:}$Dir 2 done 2 (In the first command, any number of directory names are concatenated and 2 separated with colons, but it may be undesirable to end the variable with 2 a colon because this is an obsolescent means to include dot at the end of 2 the PATH. In the second, explicit colons are used for each directory. 2 In all cases, the shell performs tilde expansion on each directory 2 because all are separate words to the shell.) 2 The exception was included to avoid breaking numerous KornShell scripts 2 and interactive users and despite the fact that variable assignments in 2 scripts derived from other systems will have to use quoting in some cases 2 to allow literal tildes in strings. (This latter problem should be 2 relatively rare because only tildes preceding known login names in 2 unquoted strings are affected.) 2 Note that expressions in operands such as 2 make -k mumble LIBDIR= chet/lib 2 ~ do not qualify as shell variable assignments and tilde expansion is not 2 performed (unless the command does so itself, which make does not). 2 In an earlier draft, tilde expansion occurred following any unquoted 2 equals-sign or colon, but this was removed because of its complexity and 2 to avoid breaking commands such as: 2 rcp hostname: marc/.profile . 2 ~ A suggestion was made that the special sequence ``$ '' should be allowed 2 to force tilde expansion anywhere. Since this is n~ot historical 2 practice, it has been left for future implementations to evaluate. (The 2 description in 3.2 requires that a dollar-sign be quoted to represent 2 itself, so the $ combination is already unspecified.) 2 ~ The results of giving tilde with an unknown login name are undefined because the KornShell + and - constructs make use of this condition, but in general it is a~n error~to give an incorrect login name with tilde. The results of having HOME unset are unspecified because some historical shells treat this as an error. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 236 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.6.2 Parameter Expansion The format for parameter expansion is as follows: ${_e_x_p_r_e_s_s_i_o_n} where _e_x_p_r_e_s_s_i_o_n consists of all characters until the matching }. Any } 2 escaped by a backslash or within a quoted string, and characters in 2 embedded arithmetic expansions, command substitutions, and variable 2 expansions, shall not be examined in determining the matching }. The simplest form for parameter expansion is: ${_p_a_r_a_m_e_t_e_r} The value, if any, of _p_a_r_a_m_e_t_e_r shall be substituted. The parameter name or symbol can be enclosed in braces, which are optional except for positional parameters with more than one digit or when _p_a_r_a_m_e_t_e_r is followed by a character that could be interpreted as part of the name. The matching closing brace shall be determined by counting brace levels, skipping over enclosed quoted strings and command substitutions. If the parameter name or symbol is not enclosed in braces, the expansion shall use the longest valid name (see 3.1.5), whether or not the symbol represented by that name exists. If a parameter expansion occurs inside double-quotes: - Pathname expansion shall not be performed on the results of the expansion. - Field splitting shall not be performed on the results of the expansion, with the exception of @; see 3.5.2. In addition, a parameter expansion can be modified by using one of the following formats. In each case that a value of _w_o_r_d is needed (based on the state of _p_a_r_a_m_e_t_e_r, as described below), _w_o_r_d shall be subjected to tilde expansion, parameter expansion, command substitution, and arithmetic expansion. If _w_o_r_d is not needed, it shall not be expanded. The } character that delimits the following parameter expansion 1 modifications shall be determined as described previously in this 1 subclause and in 3.2.3. (For example, ${foo-bar}xyz} would result in the 1 expansion of foo followed by the string xyz} if foo is set, else the string barxyz}). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 237 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d} Use Default Values. If _p_a_r_a_m_e_t_e_r is unset or null, the expansion of _w_o_r_d shall be substituted; otherwise, the value of _p_a_r_a_m_e_t_e_r shall be substituted. ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d} Assign Default Values. If _p_a_r_a_m_e_t_e_r is unset or null, the expansion of _w_o_r_d shall be assigned to _p_a_r_a_m_e_t_e_r. In all cases, the final value of _p_a_r_a_m_e_t_e_r shall be substituted. Only variables, not positional parameters or special parameters, can be assigned in this way. ${_p_a_r_a_m_e_t_e_r:?[_w_o_r_d]} Indicate Error if Null or Unset. If _p_a_r_a_m_e_t_e_r is unset or null, the expansion of _w_o_r_d (or a message indicating it is unset if _w_o_r_d is omitted) shall be written to standard error and the shell shall exit with a nonzero exit status. Otherwise, the value of _p_a_r_a_m_e_t_e_r shall be substituted. An interactive shell need not exit. ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d} Use Alternate Value. If _p_a_r_a_m_e_t_e_r is unset or null, null shall be substituted; otherwise, the expansion of _w_o_r_d shall be substituted. In the parameter expansions shown previously, use of the colon in the format results in a test for a parameter that is unset or null; omission of the colon results in a test for a parameter that is only unset. ${#_p_a_r_a_m_e_t_e_r} String Length. The length in characters of the value of _p_a_r_a_m_e_t_e_r. If _p_a_r_a_m_e_t_e_r is * or @, the result of the expansion is unspecified. The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see 3.13), rather than regular expression notation, shall be used to evaluate the patterns. If _p_a_r_a_m_e_t_e_r is * or @, the result of the expansion is unspecified. Enclosing the full parameter expansion string in double-quotes shall not 1 cause the following four varieties of pattern characters to be quoted, 1 whereas quoting characters within the braces shall have this effect. ${_p_a_r_a_m_e_t_e_r%_w_o_r_d} Remove Smallest Suffix Pattern. The _w_o_r_d shall be expanded to produce a pattern. The parameter expansion then shall result in _p_a_r_a_m_e_t_e_r, with the smallest portion of the suffix matched by the _p_a_t_t_e_r_n deleted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 238 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 ${_p_a_r_a_m_e_t_e_r%%_w_o_r_d} Remove Largest Suffix Pattern. The _w_o_r_d shall be expanded to produce a pattern. The parameter expansion then shall result in _p_a_r_a_m_e_t_e_r, with the largest portion of the suffix matched by the _p_a_t_t_e_r_n deleted. ${_p_a_r_a_m_e_t_e_r#_w_o_r_d} Remove Smallest Prefix Pattern. The _w_o_r_d shall be expanded to produce a pattern. The parameter expansion then shall result in _p_a_r_a_m_e_t_e_r, with the smallest portion of the prefix matched by the _p_a_t_t_e_r_n deleted. ${_p_a_r_a_m_e_t_e_r##_w_o_r_d} Remove Largest Prefix Pattern. The _w_o_r_d shall be expanded to produce a pattern. The parameter expansion then shall result in _p_a_r_a_m_e_t_e_r, with the largest portion of the prefix matched by the _p_a_t_t_e_r_n deleted. BEGIN_RATIONALE 3.6.2.1 Parameter Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) When the shell is scanning its input to determine the boundaries of a name, it is not bound by its knowledge of what names are already defined. For example, if F is a defined shell variable, the command "echo $Fred" does not echo the value of $F followed by red; it selects the longest possible valid name, Fred, which in this case might be unset. The rule for finding the closing } in ${...} is the one used in the KornShell and is upward compatible with the Bourne shell, which does not determine the closing } until the word is expanded. The advantage of this is that incomplete expansions, such as ${foo can be determined during tokenization, rather than during expansion. The four expansions with the optional colon have been hard to understand from the historical documentation. The following table summarizes the effect of the colon: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 239 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr set and not null set but null unset ________________ ____________ __________ ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d} substitute substitute substitute _p_a_r_a_m_e_t_e_r _w_o_r_d _w_o_r_d ${_p_a_r_a_m_e_t_e_r-_w_o_r_d} substitute substitute substitute _p_a_r_a_m_e_t_e_r null _w_o_r_d ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d} substitute assign assign _p_a_r_a_m_e_t_e_r _w_o_r_d _w_o_r_d ${_p_a_r_a_m_e_t_e_r=_w_o_r_d} substitute substitute assign _p_a_r_a_m_e_t_e_r _p_a_r_a_m_e_t_e_r _w_o_r_d ${_p_a_r_a_m_e_t_e_r:?_w_o_r_d} substitute error, error, _p_a_r_a_m_e_t_e_r exit exit ${_p_a_r_a_m_e_t_e_r?_w_o_r_d} substitute substitute error, _p_a_r_a_m_e_t_e_r null exit ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d} substitute substitute substitute _w_o_r_d null null 1 ${_p_a_r_a_m_e_t_e_r+_w_o_r_d} substitute substitute substitute _w_o_r_d _w_o_r_d null 1 In all cases shown with ``substitute,'' the expression is replaced with the value shown. In all cases shown with ``assign,'' _p_a_r_a_m_e_t_e_r is assigned that value, which also replaces the expression. The string length and substring capabilities were included because of the demonstrated need for them, based on their usage in other shells, such as C-shell and KornShell. Historical versions of the KornShell have not performed tilde expansion on the word part of parameter expansion; however, it is more consistent to do so. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 240 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E_x_a_m_p_l_e_s ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d} In this example, ls is executed only if x is null or unset. [The $(ls) command substitution notation is explained in 3.6.3.] ${x:-$(ls)} ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d} unset X echo ${X:=abc} abc ${_p_a_r_a_m_e_t_e_r:?_w_o_r_d} unset posix echo ${posix:?} sh: posix: parameter null or not set ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d} set a b c echo ${3:+posix} posix ${#_p_a_r_a_m_e_t_e_r} HOME=/usr/posix echo ${#HOME} 10 ${_p_a_r_a_m_e_t_e_r%_w_o_r_d} x=file.c echo ${x%.c}.o file.o ${_p_a_r_a_m_e_t_e_r%%_w_o_r_d} x=posix/src/std echo ${x%%/*} posix Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 241 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ${_p_a_r_a_m_e_t_e_r#_w_o_r_d} x=$HOME/src/cmd echo ${x#$HOME} /src/cmd ${_p_a_r_a_m_e_t_e_r##_w_o_r_d} x=/one/two/three echo ${x##*/} three The double-quoting of patterns is different depending on where the double-quotes are placed: "${x#*}" The asterisk is a pattern character. ${x#"*"} The literal asterisk is quoted and not special. END_RATIONALE 3.6.3 Command Substitution Command substitution allows the output of a command to be substituted in place of the command name itself. Command substitution shall occur when the command is enclosed as follows: $(_c_o_m_m_a_n_d) or (``backquoted'' version): `_c_o_m_m_a_n_d` The shell shall expand the command substitution by executing _c_o_m_m_a_n_d in a subshell environment (see 3.12) and replacing the command substitution [the text of _c_o_m_m_a_n_d plus the enclosing $( ) or backquotes] with the standard output of the command, removing sequences of one or more s at the end of the substitution. (Embedded s before the end of the output shall not be removed; however, during field splitting, they may be translated into s, depending on the value of IFS and quoting that is in effect.) Within the backquoted style of command substitution, backslash shall retain its literal meaning, except when followed by $ ` \ (dollar-sign, backquote, backslash). The search for the matching 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 242 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 backquote shall be satisfied by the first backquote found without a 2 preceding backslash; during this search, if a nonescaped backquote is 2 encountered within a shell comment, a here-document, an embedded command 2 substitution of the $(_c_o_m_m_a_n_d) form, or a quoted string, undefined 2 results occur. A single- or double-quoted string that begins, but does not end, within the `...` sequence produces undefined results. With the $(_c_o_m_m_a_n_d) form, all characters following the open parenthesis to the matching closing parenthesis constitute the _c_o_m_m_a_n_d. Any valid 2 shell script can be used for _c_o_m_m_a_n_d, except: 2 - A script consisting solely of redirections produces unspecified 2 results. 2 - See the restriction on single subshells described below. 2 The results of command substitution shall not be processed for further 1 tilde expansion, parameter expansion, command substitution, or arithmetic 1 expansion. If a command substitution occurs inside double-quotes, field splitting and pathname expansion shall not be performed on the results of the substitution. Command substitution can be nested. To specify nesting within the backquoted version, the application shall precede the inner backquotes with backslashes; for example, \`_c_o_m_m_a_n_d\` If the command substitution consists of a single subshell, such as $( (_c_o_m_m_a_n_d) ) a conforming application shall separate the $( and ( into two tokens (i.e., separate them with white space). BEGIN_RATIONALE 3.6.3.1 Command Substitution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The new $( ) form of command substitution was adopted from the KornShell to solve a problem of inconsistent behavior when using backquotes. For example: _____C_o_m_m_a_n_d_______ O_u_t_p_u_t_ echo '\$x' \$x Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 243 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX echo `echo '\$x'` $x echo $(echo '\$x') \$x Additionally, the backquoted syntax has historical restrictions on the 2 contents of the embedded command. While the new $( ) form can process 2 any kind of valid embedded script, the backquoted cannot handle some 2 valid scripts that include backquotes. For example, these otherwise 2 valid embedded scripts do not work in the left column, but do work on the 2 right: 2 echo ` echo $( 2 cat <<\eof cat <<\eof 2 a here-doc with ` a here-doc with ) 2 eof eof 2 ` ) 2 echo ` echo $( 2 echo abc # a comment with ` echo abc # a comment with ) 2 ` ) 2 echo ` echo $( 2 echo '`' echo ')' 2 ` ) 2 Some historical KornShell implementations did not process the first two 2 examples correctly, but the author has agreed to make the appropriate 2 modifications to do so. The KornShell will also be modified so that the 2 following works: 2 echo $( 2 case word in 2 [Ff]oo) echo found foo ;; 2 esac 2 ) 2 Because of these inconsistent behaviors, the backquoted variety of command substitution is not recommended for new applications that nest command substitutions or attempt to embed complex scripts. Because of 2 its widespread historical use, particularly by interactive users, however, the backquotes were retained in POSIX.2 without being declared obsolescent. The KornShell feature: If _c_o_m_m_a_n_d is of the form <_w_o_r_d, _w_o_r_d is expanded to generate a pathname, and the value of the command substitution is the contents of this file with any trailing _s deleted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 244 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 was omitted from this standard because $(cat word) is an appropriate substitute. However, to prevent breaking numerous scripts relying on 2 this feature, it is unspecified to have a script within $( ) that has 2 only redirections. 2 The requirement to separate $( and ( when a single subshell is command- substituted is to avoid any ambiguities with Arithmetic Expansion. See 3.6.4.1. END_RATIONALE 3.6.4 Arithmetic Expansion Arithmetic expansion provides a mechanism for evaluating an arithmetic expression and substituting its value. The format for arithmetic expansion shall be as follows: $((_e_x_p_r_e_s_s_i_o_n)) The expression shall be treated as if it were in double-quotes, except that a double-quote inside the expression is not treated specially. The shell shall expand all tokens in the expression for parameter expansion, command substitution, and quote removal. Next, the shell shall treat this as an arithmetic expression and substitute the value of the expression. The arithmetic expression shall be processed according to the rules given in 2.9.2.1, with the following exceptions: (1) Only integer arithmetic is required. (2) The sizeof() operator and the prefix and postfix ++ and -- operators are not required. (3) Selection, Iteration, and Jump Statements are not supported. As an extension, the shell may recognize arithmetic expressions beyond those listed. If the expression is invalid, the expansion fails and the shell shall write a message to standard error indicating the failure. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 245 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.6.4.1 Arithmetic Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Numerous ballots were received objecting to the inclusion of the (( )) form of KornShell arithmetic in previous drafts. The developers of the standard concluded that there is a strong desire for some kind of arithmetic evaluator to replace expr, and that tying it in with $ makes it fit in nicely with the standard shell language, and provides access to arithmetic evaluation in places where accessing a utility would be inconvenient or clumsy. Following long debate by interested members of the balloting group, the syntax and semantics for arithmetic were changed. The language is essentially a pure arithmetic evaluator of constants and operators (excluding assignment) and represents a simple subset of the previous arithmetic language [which was derived from the KornShell's (( )) construct]. The syntax was changed from that of a command denoted by ((_e_x_p_r_e_s_s_i_o_n)), to an expansion denoted by $((_e_x_p_r_e_s_s_i_o_n)). The new form is a dollar expansion ($), which evaluates the expression and substitutes the resulting value. Objections to the previous style of arithmetic included that it was too complicated, did not fit in well with the shell's use of variables, and the syntax conflicted with subshells. The justification for the new syntax is that the shell is traditionally a macro language, and if a new feature is to be added, it should be done by extending the capabilities presented by the current model of the shell, rather than by inventing a new one outside the model: adding a new dollar expansion was perceived to be the most intuitive and least destructive way to add such a new capability. In Drafts 9 and 10, a form $[_e_x_p_r_e_s_s_i_o_n] was used. It was functionally equivalent to the $(( )) of the current text, but objections were lodged that the 1988 KornShell had already implemented $(( )) and there was no compelling reason to invent yet another syntax. Furthermore, the $[] syntax had a minor incompatibility involving the patterns in case statements. The portion of the C Standard {7} arithmetic operations selected corresponds to the operations historically supported in the KornShell. A simple example using arithmetic expansion: # repeat a command 100 times x=100 while [ $x -gt 0 ] do _c_o_m_m_a_n_d x=$(($x-1)) done Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 246 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 It was concluded that the test command ([) was sufficient for the majority of relational arithmetic tests, and that tests involving complicated relational expressions within the shell are rare, yet could still be accommodated by testing the value of $(()) itself. For example: # a complicated relational expression while [ $(( (($x + $y)/($a * $b)) < ($foo*$bar) )) -ne 0 ] or better yet, the rare script that has many complex relational expressions could define a function like this: val() { return $((!$1)) } and complicated tests would be less intimidating: while val $(( (($x + $y)/($a * $b)) < ($foo*$bar) )) do # some calculations done Another suggestion was to modify true and false to take an optional argument, and true would exit true only if the argument is nonzero, and false would exit false only if the argument is nonzero. The suggestion was not favorably received by the balloting group (those contacted were negative about it, all others were silent in their latest ballots). while true $(($x > 5 && $y <= 25)) There is a minor portability concern with the new syntax. The example $((2+2)) could have been intended to mean a command substitution of a utility named 2+2 in a subshell. The developers of POSIX.2 consider this to be obscure and isolated to some KornShell scripts [because $( ) command substitution existed previously only in the KornShell]. The text on Command Substitution has been changed to require that the $( and ( be separate tokens if this usage is needed. An example such as echo $((echo hi);(echo there)) should not be misinterpreted by the shell as arithmetic because attempts to balance the parentheses pairs would indicate that they are subshells. 1 However, as indicated by 3.1.1, a conforming application must separate 1 two adjacent parentheses with white space to indicate nested subshells. 1 END_RATIONALE 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.6 Word Expansions 247 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.6.5 Field Splitting After parameter expansion (3.6.2), command substitution (3.6.3), and arithmetic expansion (3.6.4) the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result. The shell shall treat each character of the IFS as a delimiter and use the delimiters to split the results of parameter expansion and command substitution into fields. (1) If the value of IFS is , , and , or if it is unset, any sequence of , , or characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field. (For example, the input foobar yields two fields, foo and bar). (2) If the value of IFS is null, no field splitting shall be performed. (3) Otherwise, the following rules shall be applied in sequence. 1 The term ``IFS white space'' is used to mean any sequence (zero 1 or more instances) of white-space characters that are in the IFS 1 value (e.g., if IFS contains , any sequence 1 of and characters is considered IFS white space). 1 (a) IFS white space shall be ignored at the beginning and end 1 of the input. 1 (b) Each occurrence in the input of an IFS character that is 1 not IFS white space, along with any adjacent IFS white 1 space, shall delimit a field, as described previously. 1 (c) Nonzero-length IFS white space shall delimit a field. 1 BEGIN_RATIONALE 3.6.5.1 Field Splitting Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The operation of field splitting using IFS as described in earlier drafts was based on the way the KornShell splits words, but is incompatible with other common versions of the shell. However, each has merit, and so a decision was made to allow both. If the IFS variable is unset, or is , the operation is equivalent to the way the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 248 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 System V shell splits words. Using characters outside the set yields the KornShell behavior, where each of the non- characters is significant. This behavior, which affords the most flexibility, was taken from the way the original awk handled field splitting. The (3) rule can be summarized as a pseudo ERE: 1 (s*ns*|s+) 1 where s is an IFS white-space character and n is a character in the IFS 1 that is not white space. Any string matching that ERE delimits a field, 1 except that the s+ form does not delimit fields at the beginning or the 1 end of a line. For example, if IFS is , the string 1 red,whiteblue 1 yields the three colors as the delimited fields. 1 END_RATIONALE 1 3.6.6 Pathname Expansion After field splitting, if set -f is not in effect, each field in the resulting command line shall be expanded using the algorithm described in 3.13, qualified by the rules in 3.13.3. 3.6.7 Quote Removal The quote characters \ ' " (backslash, single-quote, double-quote) that were present in the original word shall be removed unless they have themselves been quoted. 3.7 Redirection Redirection is used to open and close files for the current shell execution environment (see 3.12) or for any command. _R_e_d_i_r_e_c_t_i_o_n _o_p_e_r_a_t_o_r_s can be used with numbers representing file descriptors (see the definition in POSIX.1 {8}) as described below. See also 2.9.1. The relationship between these file descriptors and access to them in a programming language is specified in the language binding for that language to this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.7 Redirection 249 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The overall format used for redirection is: [_n]_r_e_d_i_r-_o_p _w_o_r_d The number _n is an optional decimal number designating the file descriptor number; it shall be delimited from any preceding text and immediately precede the redirection operator _r_e_d_i_r-_o_p. If _n is quoted, the number shall not be recognized as part of the redirection expression. (For example, echo \2>a writes the character 2 into file a). If any part of _r_e_d_i_r-_o_p is quoted, no redirection expression shall be recognized. (For example, echo 2\>a writes the characters 2>a to standard output.) The optional number, redirection operator, and _w_o_r_d shall not appear in the arguments provided to the command to be executed (if any). In this standard, open files are represented by decimal numbers starting with zero. It is implementation defined what the largest value can be; however, all implementations shall support at least 0 through 9 for use by the application. These numbers are called _f_i_l_e _d_e_s_c_r_i_p_t_o_r_s. The values 0, 1, and 2 have special meaning and conventional uses and are implied by certain redirection operations; they are referred to as _s_t_a_n_d_a_r_d _i_n_p_u_t, _s_t_a_n_d_a_r_d _o_u_t_p_u_t, and _s_t_a_n_d_a_r_d _e_r_r_o_r, respectively. Programs usually take their input from standard input, and write output on standard output. Error messages are usually written to standard error. The redirection operators can be preceded by one or more digits (with no intervening s allowed) to designate the file descriptor number. If the redirection operator is << or <<-, the word that follows the redirection operator shall be subjected to quote removal; it is unspecified whether any of the other expansions occur. For the other redirection operators, the word that follows the redirection operator shall be subjected to tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal. Pathname expansion shall not be performed on the word by a noninteractive shell; an interactive shell may perform it, but shall do so only when the expansion would result in one word. If more than one redirection operator is specified with a command, the order of evaluation is from beginning to end. In the following description of redirections, references are made to opening and creating files. These references shall conform to the requirements in 2.9.1.4. A failure to open or create a file shall cause the redirection to fail. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 250 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.7.1 Redirecting Input Input redirection shall cause the file whose name results from the expansion of _w_o_r_d to be opened for reading on the designated file descriptor, or standard input if the file descriptor is not specified. The general format for redirecting input is: [_n]<_w_o_r_d where the optional _n represents the file descriptor number. If the number is omitted, the redirection shall refer to standard input (file descriptor 0). 3.7.2 Redirecting Output The two general formats for redirecting output are: [_n]>_w_o_r_d [_n]>|_w_o_r_d where the optional _n represents the file descriptor number. If the number is omitted, the redirection shall refer to standard output (file descriptor 1). Output redirection using the > format shall fail if the _n_o_c_l_o_b_b_e_r option 1 is set (see the description of set -C in 3.14.11) and the file named by 1 the expansion of _w_o_r_d exists and is a regular file. Otherwise, 1 redirection using the > or >| formats shall cause the file whose name 1 results from the expansion of _w_o_r_d to be created and opened for ouput on the designated file descriptor, or standard output if none is specified. If the file does not exist, it shall be created; otherwise, it shall be truncated to be an empty file after being opened. 3.7.3 Appending Redirected Output Appended output redirection shall cause the file whose name results from the expansion of word to be opened for output on the designated file descriptor. The file is opened as if the POSIX.1 {8} _o_p_e_n() function was called with the O_APPEND flag. If the file does not exist, it shall be created. The general format for appending redirected output is as follows: [_n]>>_w_o_r_d where the optional _n represents the file descriptor number. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.7 Redirection 251 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.7.4 Here-Document The redirection operators << and <<- both allow redirection of lines contained in a shell input file, known as a _h_e_r_e-_d_o_c_u_m_e_n_t, to the standard input of a command. The here-document shall be treated as a single word that begins after the next and continues until there is a line containing only the delimiter, with no trailing _s. Then the next here-document starts, if there is one. The format is as follows: [_n]<<_w_o_r_d _h_e_r_e-_d_o_c_u_m_e_n_t _d_e_l_i_m_i_t_e_r If any character in _w_o_r_d is quoted, the delimiter shall be formed by performing quote removal on _w_o_r_d, and the here-document lines shall not be expanded. Otherwise, the delimiter shall be the _w_o_r_d itself. If no characters in _w_o_r_d are quoted, all lines of the here-document shall be expanded for parameter expansion, command substitution, and arithmetic expansion. In this case, the backslash in the input shall behave as the backslash inside double-quotes (see 3.2.3). However, the double-quote character (") shall not be treated specially within a here-document, except when the double-quote appears within $( ), ` `, or ${ }. 1 If the redirection symbol is <<-, all leading characters shall be stripped from input lines and the line containing the trailing delimiter. If more than one << or <<- operator is specified on a line, the here- document associated with the first operator shall be supplied first by the application and shall be read first by the shell. 3.7.5 Duplicating an Input File Descriptor The redirection operator [_n]<&_w_o_r_d is used to duplicate one input file descriptor from another, or to close one. If _w_o_r_d evaluates to one or more digits, the file descriptor denoted by _n, or standard input if _n is not specified, shall be made to be a copy of the file descriptor denoted by _w_o_r_d; if the digits in _w_o_r_d do not represent a file descriptor already open for input, a redirection 1 error shall result (see 3.8.1). If _w_o_r_d evaluates to -, file descriptor 1 _n, or standard input if _n is not specified, shall be closed. If _w_o_r_d evaluates to something else, the behavior is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 252 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.7.6 Duplicating an Output File Descriptor The redirection operator [_n]>&_w_o_r_d is used to duplicate one output file descriptor from another, or to close one. If _w_o_r_d evaluates to one or more digits, the file descriptor denoted by _n, or standard output if _n is not specified, shall be made to be a copy of the file descriptor denoted by _w_o_r_d; if the digits in _w_o_r_d do not represent a file descriptor already open for output, a redirection 1 error shall result (see 3.8.1). If _w_o_r_d evaluates to -, file descriptor 1 _n, or standard output if _n is not specified, shall be closed. If _w_o_r_d evaluates to something else, the behavior is unspecified. 3.7.7 Open File Descriptors for Reading and Writing. The redirection operator [_n]<>_w_o_r_d shall cause the file whose name is the expansion of _w_o_r_d to be opened for both reading and writing on the file descriptor denoted by _n, or standard input if _n is not specified. If the file does not exist, it shall be created. BEGIN_RATIONALE 3.7.8 Redirection Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) In the C binding for POSIX.1 {8}, file descriptors are integers in the range 0 - ({OPEN_MAX}-1). The file descriptors discussed in Redirection are that same set of small integers. As POSIX.2 is being finalized, it is not known how file descriptors will be represented in the language-independent description of POSIX.1 {8}. The current consensus appears to be that they will remain as small integers, but it is still possible that they will be defined as an opaque type. If they remain as integers, then the current POSIX.2 wording is acceptable. If they become an opaque type, then the C binding to POSIX.1 {8} will have to define the mapping from the binding's small integers to the opaque type, and the Redirection clause in POSIX.2 will have to be modified to specify that same mapping. Having multidigit file descriptor numbers for I/O redirection can cause some obscure compatibility problems. Specifically, scripts that depend on an example command: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.7 Redirection 253 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX echo 22>/dev/null echoing "2" are somewhat broken to begin with. However, the file descriptor number still must be delimited from the preceding text. For example, cat file2>foo will write the contents of file2, not the contents of file. The >| format of output redirection was adopted from the KornShell. Along with the _n_o_c_l_o_b_b_e_r option, set -C, it provides a safety feature to prevent inadvertent overwriting of existing files. (See the rationale with the pathchk utility for why this step was taken.) The restriction on regular files is historical practice. The System V shell and the KornShell have differed historically on pathname expansion of _w_o_r_d; the former never performed it, the latter only when the result was a single field (file). As a compromise, it was decided that the KornShell functionality was useful, but only as a shorthand device for interactive users. No reasonable shell script would be written with a command such as: cat foo > a* Thus, shell scripts are prohibited from doing it, while interactive users can select the shell with which they are most comfortable. The construct 2>&1 is often used to redirect standard error to the same file as standard output. Since the redirections take place beginning to end, the order of redirections is significant. For example: ls > foo 2>&1 directs both standard output and standard error to file foo. However ls 2>&1 > foo only directs standard output to file foo because standard error was duplicated as standard output before standard output was directed to file foo. The <> operator is a feature first documented in the KornShell, but it has been silently present in both System V and BSD shells. It could be useful in writing an application that worked with several terminals, and occasionally wanted to start up a shell. That shell would in turn be unable to run applications that run from an ordinary controlling terminal 1 unless it could make use of <> redirection. The specific example is a 1 historical version of the pager more, which reads from standard error to Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 254 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 get its commands, so standard input and standard output are both available for their usual usage. There is no way of saying the following in the shell without <>: cat food | more - >/dev/tty03 2<>/dev/tty03 Another example of <> is one that opens /dev/tty on file descriptor 3 for reading and writing: exec 3<> /dev/tty An example of creating a lock file for a critical code region: set -C until 2> /dev/null > lockfile do sleep 30 done set +C _p_e_r_f_o_r_m _c_r_i_t_i_c_a_l _f_u_n_c_t_i_o_n rm lockfile Since /dev/null is not a regular file, no error is generated by redirecting to it in _n_o_c_l_o_b_b_e_r mode. The case of a missing delimiter at the end of a here-document is not specified. This is considered an error in the script (one that sometimes can be difficult to diagnose), although some systems have treated end- of-file as an implicit delimiter. Tilde expansion is not performed on a here-document because the data is 1 treated as if it were enclosed in double-quotes. 1 END_RATIONALE 1 3.8 Exit Status and Errors 3.8.1 Consequences of Shell Errors For a noninteractive shell, an error condition encountered by a special built-in (see 3.14) or other type of utility shall cause the shell to write a diagnostic message to standard error and exit as shown in the following table: S_p_e_c_i_a_l__B_u_i_l_t_-_i_n_ O_t_h_e_r__U_t_i_l_i_t_i_e_s_ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.8 Exit Status and Errors 255 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Shell language syntax error shall exit shall exit Utility syntax error (option shall exit shall not exit or operand error) Redirection error shall exit shall not exit Variable assignment error shall exit shall not exit Expansion error shall exit shall exit Command not found n/a may exit dot script not found shall exit n/a An ``expansion error'' is one that occurs when the shell expansions defined in 3.6 are carried out (e.g., ${x!y}, because ! is not a valid operator); an implementation may treat these as syntax errors if it is able to detect them during tokenization, rather than during expansion. If any of the errors shown as ``shall (may) exit'' occur in a subshell, the subshell shall (may) exit with a nonzero status, but the script containing the subshell shall not exit because of the error. In all of the cases shown in the table, an interactive shell shall write a diagnostic message to standard error without exiting. 3.8.2 Exit Status for Commands Each command has an exit status that can influence the behavior of other shell commands. The exit status of commands that are not utilities are documented in this subclause. The exit status of the standard utilities are documented in their respective clauses. If a command is not found by the shell, the exit status shall be 127. If 1 the command name is found, but it is not an executable utility, the exit 1 status shall be 126. See 3.9.1.1. Applications that invoke utilities 1 without using the shell should use these exit status values to report 1 similar errors. 1 If a command fails during word expansion or redirection, its exit status shall be greater than zero. Internally, for purposes of deciding if a command exits with a nonzero exit status, the shell shall recognize the entire status value retrieved for the command by the equivalent of the POSIX.1 {8} _w_a_i_t() function WEXITSTATUS macro. When reporting the exit status with the special parameter ?, the shell shall report the full eight bits of exit status available. The exit status of a command that terminated because it received a signal shall be reported as greater than 128. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 256 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.8.3 Exit Status and Errors Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) There is a historical difference in sh and ksh noninteractive error behavior. When a command named in a script is not found, some implementations of sh exit immediately, but ksh continues with the next command. Thus, POSIX.2 says that the shell ``may'' exit in this case. This puts a small burden on the programmer, who will have to test for successful completion following a command if it is important that the next command not be executed if the previous was not found. If it is important for the command to have been found, it was probably also important for it to complete successfully. The test for successful completion would not need to change. Historically, shells have returned an exit status of 128+_n, where _n represents the signal number. Since signal numbers are not standardized, there is no portable way to determine which signal caused the termination. Also, it is possible for a command to exit with a status in the same range of numbers that the shell would use to report that the command was terminated by a signal. Implementations are encouraged to 1 chose exit values greater than 256 to indicate programs that terminated 1 by a signal so that the exit status cannot be confused with an exit 1 status generated by a normal termination. 1 Historical shells make the distinction between ``utility not found'' and 1 ``utility found but cannot execute'' in their error messages. By 1 specifying two seldomly used exit status values for these cases, 127 and 1 126 respectively, this gives an application the opportunity to make use 1 of this distinction without having to parse an error message that would 1 probably change from locale to locale. The POSIX.2 command, env, nohup, 1 and xargs utilities also have been specified to use this convention. 1 When a command fails during word expansion or redirection, most historical implementations exit with a status of 1. However, there was some sentiment that this value should probably be much higher, so that an application could distinguish this case from the more normal exit status values. Thus, the language ``greater than zero'' was selected to allow either method to be implemented. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.8 Exit Status and Errors 257 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.9 Shell Commands This clause describes the basic structure of shell commands. The following command descriptions each describe a format of the command that is only used to aid the reader in recognizing the command type, and does not formally represent the syntax. Each description discusses the semantics of the command; for a formal description of the command language, consult the grammar in 3.10. A _c_o_m_m_a_n_d is one of the following: - _s_i_m_p_l_e _c_o_m_m_a_n_d (see 3.9.1) - _p_i_p_e_l_i_n_e (see 3.9.2) - _l_i_s_t or _c_o_m_p_o_u_n_d-_l_i_s_t (see 3.9.3) - _c_o_m_p_o_u_n_d _c_o_m_m_a_n_d (see 3.9.4) - _f_u_n_c_t_i_o_n _d_e_f_i_n_i_t_i_o_n (see 3.9.5). Unless otherwise stated, the exit status of a command is that of the last simple command executed by the command. There is no limit on the size of any shell command other than that imposed by the underlying system (memory constraints, {ARG_MAX}, etc.). BEGIN_RATIONALE 3.9.0.1 Shell Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) A description of an ``empty command'' was removed from an earlier draft 1 because it is only relevant in the cases of sh -c "", system(""), or an 1 empty shell-script file (such as the implementation of true on some 1 historical systems). Since it is no longer mentioned in POSIX.2, it 1 falls into the silently unspecified category of behavior where 1 implementations can continue to operate as they have historically, but 1 conforming applications will not construct empty commands. (However, 1 note that sh does explicitly state an exit status for an empty string or 1 file.) In an interactive session or a script with other commands, extra s or semicolons, such as $ false $ $ echo $? 1 would not qualify as the empty command described here because they would be consumed by other parts of the grammar. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 258 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 3.9.1 Simple Commands A _s_i_m_p_l_e _c_o_m_m_a_n_d is a sequence of optional variable assignments and redirections, in any sequence, optionally followed by words and redirections, terminated by a control operator. When a given simple command is required to be executed (i.e., when any 1 conditional construct such as an AND-OR list or a case statement has not 1 bypassed the simple command), the following expansions, assignments, and 1 redirections shall all be performed from the beginning of the command text to the end. (1) The words that are recognized as variable assignments or redirections according to 3.10.2 are saved for processing in steps (3) and (4). (2) The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name, and remaining fields shall be the arguments for the command. (3) Redirections shall be performed as described in 3.7. (4) Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value. In the preceding list, the order of steps (3) and (4) may be reversed for the processing of special built-in utilities. See 3.14. If no command name results, variable assignments shall affect the current execution environment. Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment (except for special built-ins). If any of the variable assignments attempt to assign a value to a read- only variable, a variable assignment error shall occur. See 3.8.1 for the consequences of these errors. If there is no command name, any redirections shall be performed in a subshell environment; it is unspecified whether this subshell environment is the same one as that used for a command substitution within the command. [To affect the current execution environment, see exec (3.14.6)]. If any of the redirections performed in the current shell execution environment fail, the command shall immediately fail with an exit status greater than zero, and the shell shall write an error message indicating the failure. See 3.8.1 for the consequences of these failures Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 259 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX on interactive and noninteractive shells. If there is a command name, execution shall continue as described in 3.9.1.1. If there is no command name, but the command contained a command substitution, the command shall complete with the exit status of the last command substitution performed. Otherwise, the command shall complete with a zero exit status. BEGIN_RATIONALE 3.9.1.0.1 Simple Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The enumerated list is used only when the command is actually going to be 1 executed. For example, in: 1 true || $foo * 1 no expansions are performed. 1 The following example illustrates both how a variable assignment without a command name affects the current execution environment, and how an assignment with a command name only affects the execution environment of the command. $ x=red $ echo $x red $ export x $ sh -c 'echo $x' red $ x=blue sh -c 'echo $x' blue $ echo $x red This next example illustrates that redirections without a command name are still performed. $ ls foo ls: foo: no such file or directory $ > foo $ ls foo foo Historical practice is for a command without a command name, but that includes a command substitution, to have an exit status of the last command substitution that the shell performed and some historical scripts rely on this. For example: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 260 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 if x=$(_c_o_m_m_a_n_d) then ... fi An example of redirections without a command name being performed in a subshell shows that the here-document does not disrupt the standard input of the while loop: IFS=: while read a b do echo $a <<-eof Hello eof done foo || { echo "error: foo cannot be created" >&2 1 exit 1 1 } # set saved if /vmunix.save exists test -f /vmunix.save && saved=1 Command substitution and redirections without command names both occur in subshells, but they are not the same ones. For example, in: 1 exec 3> file var=$(echo foo >&3) 3>&1 it is unspecified whether foo will be echoed to the file or to standard output. END_RATIONALE 3.9.1.1 Command Search and Execution If a simple command results in a command name and an optional list of arguments, the following actions shall be performed. (1) If the command name does not contain any slashes, the first successful step in the following sequence shall occur: (a) If the command name matches the name of a special built-in utility, that special built-in utility shall be invoked. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 261 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (b) If the command name matches the name of a function known to this shell, the function shall be invoked as described in 3.9.5. [If the implementation has provided a standard utility in the form of a function, it shall not be recognized at this point. It shall be invoked in conjunction with the path search in step (1)(d).] (c) If the command name matches the name of a utility listed in Table 2-2 (see 2.3), that utility shall be invoked. (d) Otherwise, the command shall be searched for using the PATH environment variable as described in 2.6: [1] If the search is successful: [a] If the system has implemented the utility as a regular built-in or as a shell function, it shall be invoked at this point in the path search. [b] Otherwise, the shell shall execute the utility 1 in a separate utility environment (see 3.12) 1 with actions equivalent to calling the 1 POSIX.1 {8} _e_x_e_c_v_e() function with the _p_a_t_h argument set to the pathname resulting from the search, _a_r_g_0 set to the command name, and the remaining arguments set to the operands, if any. If the _e_x_e_c_v_e() function fails due to an error equivalent to the POSIX.1 {8} error [ENOEXEC], the shell shall execute a command equivalent to having a shell invoked with the command name as its first operand, along with any remaining arguments passed along. If the executable file is not a text file, the shell may bypass this command execution, write an error message, and return an exit status of 1 126. 1 Once a utility has been searched for and found (either as a result of this specific search or as part of an unspecified shell startup activity), an implementation may remember its location and need not search for the utility again unless the PATH variable has been the subject of an assignment. If the remembered location fails for a subsequent invocation, the shell shall repeat the search to find the new location for the utility, if any. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 262 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 [2] If the search is unsuccessful, the command shall fail with an exit status of 127 and the shell shall write an error message. (2) If the command name does contain slashes, the shell shall execute the utility in a separate utility environment with 1 actions equivalent to calling the POSIX.1 {8} _e_x_e_c_v_e() function 1 with the _p_a_t_h and _a_r_g_0 arguments set to the command name, and the remaining arguments set to the operands, if any. If the _e_x_e_c_v_e() function fails due to an error equivalent to the POSIX.1 {8} error [ENOEXEC], the shell shall execute a command equivalent to having a shell invoked with the command name as its first operand, along with any remaining arguments passed along. If the executable file is not a text file, the shell may bypass this command execution, write an error message, and return an exit status of 126. 1 BEGIN_RATIONALE 3.9.1.1.1 Command Search and Execution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This description requires that the shell can execute shell scripts directly, even if the underlying system does not support the common #! interpreter convention. That is, if file foo contains shell commands and is executable, the following will execute foo: ./foo The command search shown here does not match all historical implementations. A more typical sequence has been: - Any built-in, special or regular. - Functions. - Path search for executable files. But there are problems with this sequence. Since the programmer has no idea in advance which utilities might have been built into the shell, a function cannot be used to portably override a utility of the same name. (For example, a function named cd cannot be written for many historical systems.) Furthermore, the PATH variable is partially ineffective in this case and only a pathname with a slash can be used to ensure a specific executable file is invoked. The sequence selected for POSIX.2 acknowledges that special built-ins cannot be overridden, but gives the programmer full control over which Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 263 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX versions of other utilities are executed. It provides a means of suppressing function lookup (via the command utility; see 4.12) for the user's own functions and ensures that any regular built-ins or functions provided by the implementation are under the control of the path search. The mechanisms for associating built-ins or functions with executable files in the path are not specified by POSIX.2, but the wording requires that if either is implemented, the application will not be able to distinguish a function or built-in from an executable (other than in terms of performance, presumably). The implementation must ensure that all effects specified by POSIX.2 resulting from the invocation of the regular built-in or function (interaction with the environment, variables, traps, etc.) are identical to those resulting from the invocation of an executable file. Example: Consider three versions of the ls utility: - The application includes a shell function named ls. - The user writes her own utility named ls and puts it in /hsa/bin. - The example implementation provides ls as a regular shell built-in that will be invoked (either by the shell or directly by _e_x_e_c) when the path search reaches the directory /posix/bin. If PATH=/posix/bin, various invocations yield different versions of ls: Invocation Version of ls _______________________________________________ __________________ ls (from within application script) (1) function command ls (from within application script) (3) built-in ls (from within makefile called by application) (3) built-in system("ls") (3) built-in PATH="/hsa/bin:$PATH" ls (2) user's version After the _e_x_e_c_v_e() failure described, the shell normally executes the file as a shell script. Some implementations, however, attempt to detect whether the file is actually a script and not an executable from some other architecture. The method used by the KornShell is allowed by the text that indicates nontext files may be bypassed. END_RATIONALE 3.9.2 Pipelines A _p_i_p_e_l_i_n_e is a sequence of one or more commands separated by the control operator |. The standard output of all but the last command shall be connected to the standard input of the next command. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 264 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 The format for a pipeline is: [!] _c_o_m_m_a_n_d_1 [ | _c_o_m_m_a_n_d_2 ...] The standard output of _c_o_m_m_a_n_d_1 shall be connected to the standard input of _c_o_m_m_a_n_d_2. The standard input, standard output, or both of a command shall be considered to be assigned by the pipeline before any redirection specified by redirection operators that are part of the command (see 3.7). If the pipeline is not in the background (see 3.9.3.1), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete. _E_x_i_t__S_t_a_t_u_s If the reserved word ! does not precede the pipeline, the exit status shall be the exit status of the last command specified in the pipeline. Otherwise, the exit status is the logical NOT of the exit status of the last command. That is, if the last command returns zero, the exit status shall be 1; if the last command returns greater than zero, the exit status is zero. BEGIN_RATIONALE 3.9.2.1 Pipelines Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Because pipeline assignment of standard input or standard output or both takes place before redirection, it can be modified by redirection. For example: $ command1 2>&1 | command2 sends both the standard output and standard error of command1 to the standard input of command2. The reserved word ! was added to allow more flexible testing using AND and OR lists. It was suggested that it would be better to return a nonzero value if any command in the pipeline terminates with nonzero status (perhaps the bitwise OR of all return values). However, the choice of the last- specified command semantics are historical practice and would cause application breakage if changed. An example of historical (and POSIX.2) behavior: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 265 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX $ sleep 5 | (exit 4) $ echo $? 4 $ (exit 4) | sleep 5 1 $ echo $? 1 0 1 END_RATIONALE 3.9.3 Lists An _A_N_D-_O_R-_l_i_s_t is a sequence of one or more pipelines separated by the operators && || A _l_i_s_t is a sequence of one or more AND-OR-lists separated by the operators ; & and optionally terminated by ; & The operators && and || shall have equal precedence and shall be evaluated from beginning to end. A ; or terminator shall cause the preceding AND-OR-list to be executed sequentially; an & shall cause asynchronous execution of the preceding AND-OR-list. The term _c_o_m_p_o_u_n_d-_l_i_s_t is derived from the grammar in 3.10; it is equivalent to a sequence of _l_i_s_t_s, separated by s, that can be preceded or followed by an arbitrary number of s. BEGIN_RATIONALE 3.9.3.0.1 Lists Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The equal precedence of && and || is historical practice. The developers of the standard evaluated the model used more frequently in high level programming languages, such as C, to allow the shell logical operators to be used for complex expressions in an unambiguous way, but could not in the end allow existing scripts to break in the subtle way unequal precedence might cause. Some arguments were posed concerning the { } or ( ) groupings that are required historically. There are some disadvantages to these groupings: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 266 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 - The ( ) can be expensive, as they spawn other processes on some systems. This performance concern is primarily an implementation issue. - The { } braces are not operators (they are reserved words) and require a trailing space after each {, and a semicolon before each }. Most programmers (and certainly interactive users) have avoided braces as grouping constructs because of the irritating syntax required. Braces were not changed to operators because that would generate compatibility issues even greater than the precedence question; braces appear outside the context of a keyword in many shell scripts. An example reiterates the precedence of the lists as they associate from 1 beginning to end. Both of the following commands write solely bar to 1 standard output: 1 false && echo foo || echo bar 1 true || echo foo && echo bar 1 The following is an example that illustrates s in compound- lists: while # a couple of newlines # a list date && who || ls; cat file # a couple of newlines # another list wc file > output & true do # 2 lists ls cat file done END_RATIONALE 3.9.3.1 Asynchronous Lists If a command is terminated by the control operator ampersand (&), the shell shall execute the command asynchronously in a subshell. This means that the shell shall not wait for the command to finish before executing the next command. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 267 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The format for running a command in background is: _c_o_m_m_a_n_d_1 & [_c_o_m_m_a_n_d_2 & ...] The standard input for an asynchronous list, before any explicit redirections are performed, shall be considered to be assigned to a file that has the same properties as /dev/null. If it is an interactive shell, this need not happen. In all cases, explicit redirection of standard input shall override this activity. When an element of an asynchronous list (the portion of the list ended by 1 an ampersand, such as _c_o_m_m_a_n_d_1, above) is started by the shell, the 1 process ID of the last command in the asynchronous list element shall 1 become known in the current shell execution environment; see 3.12. This process ID shall remain known until: - The command terminates and the application waits for the process ID, or - Another asynchronous list is invoked before $! (corresponding to 1 the previous asynchronous list) is expanded in the current 1 execution environment. 1 The implementation need not retain more than the {CHILD_MAX} most recent 1 entries in its list of known process IDs in the current shell execution 1 environment. 1 _E_x_i_t__S_t_a_t_u_s The exit status of an asynchronous list shall be zero. BEGIN_RATIONALE 3.9.3.1.1 Asynchronous Lists Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The grammar treats a construct such as 1 foo & bar & bam & 1 as one ``asynchronous list,'' but since the status of each element is 1 tracked by the shell, the term ``element of an asynchronous list'' was 1 introduced to identify just one of the foo, bar, bam portions of the 1 overall list. 1 Unless the implementation has an internal limit, such as {CHILD_MAX}, on 1 the retained process IDs, it would require unbounded memory for the 1 following example: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 268 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 while true 1 do foo & echo $! 1 done 1 The treatment of the signals SIGINT and SIGQUIT with asynchronous lists is described in 3.11. Since the connection of the input to the equivalent of /dev/null is considered to occur before redirections, the following script would produce no output: exec < /etc/passwd cat <&0 & wait END_RATIONALE 3.9.3.2 Sequential Lists Commands that are separated by a semicolon (;) shall be executed sequentially. The format for executing commands sequentially is: _c_o_m_m_a_n_d_1 [; _c_o_m_m_a_n_d_2] ... Each command shall be expanded and executed in the order specified. _E_x_i_t__S_t_a_t_u_s The exit status of a sequential list shall be the exit status of the last command in the list. 3.9.3.3 AND Lists The control operator && shall denote an AND list. The format is: _c_o_m_m_a_n_d_1 [ && _c_o_m_m_a_n_d_2] ... First _c_o_m_m_a_n_d_1 is executed. If its exit status is zero, _c_o_m_m_a_n_d_2 is executed, and so on until a command has a nonzero exit status or there are no more commands left to execute. The commands shall be expanded only if they are executed. _E_x_i_t__S_t_a_t_u_s The exit status of an AND list shall be the exit status of the last command that is executed in the list. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 269 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.9.3.4 OR Lists The control operator || shall denote an OR List. The format is: _c_o_m_m_a_n_d_1 [ || _c_o_m_m_a_n_d_2] ... First, _c_o_m_m_a_n_d_1 is executed. If its exit status is nonzero, _c_o_m_m_a_n_d_2 is executed, and so on until a command has a zero exit status or there are no more commands left to execute. _E_x_i_t__S_t_a_t_u_s The exit status of an OR list shall be the exit status of the last command that is executed in the list. 3.9.4 Compound Commands The shell has several programming constructs that are _c_o_m_p_o_u_n_d _c_o_m_m_a_n_d_s, which provide control flow for commands. Each of these compound commands has a reserved word or control operator at the beginning, and a corresponding terminator reserved word or operator at the end. In addition, each can be followed by redirections on the same line as the terminator. Each redirection shall apply to all the commands within the compound command that do not explicitly override that redirection. 3.9.4.1 Grouping Commands The format for grouping commands is as follows: (_c_o_m_p_o_u_n_d-_l_i_s_t) Execute _c_o_m_p_o_u_n_d-_l_i_s_t in a subshell environment; see 3.12. Variable assignments and built-in commands that affect the environment shall not remain in effect after the list finishes. { _c_o_m_p_o_u_n_d-_l_i_s_t;} Execute _c_o_m_p_o_u_n_d-_l_i_s_t in the current process environment. _E_x_i_t__S_t_a_t_u_s The exit status of a grouping command shall be the exit status of _l_i_s_t. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 270 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.9.4.1.1 Grouping Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The semicolon shown in { _c_o_m_p_o_u_n_d-_l_i_s_t;} is an example of a control operator delimiting the } reserved word. Other delimiters are possible, as shown in 3.10; is frequently used. A proposal was made to use the construct in all cases where command grouping performed in the current process environment is performed, identifying it as a construct for the grouping commands, as well as for shell functions. This was not included because the shell already has a grouping construct for this purpose ({ }), and changing it would have been counter-productive. END_RATIONALE 3.9.4.2 for Loop The for loop shall execute a sequence of commands for each member in a list of _i_t_e_m_s. The for loop requires that the _r_e_s_e_r_v_e_d _w_o_r_d_s do and done be used to delimit the sequence of commands. The format for the for loop is as follows. for _n_a_m_e [ in _w_o_r_d ... ] do _c_o_m_p_o_u_n_d-_l_i_s_t done First, the list of words following in shall be expanded to generate a list of items. Then, the variable _n_a_m_e shall be set to each item, in turn, and the _c_o_m_p_o_u_n_d-_l_i_s_t executed each time. If no items result from the expansion, the _c_o_m_p_o_u_n_d-_l_i_s_t shall not be executed. Omitting in _w_o_r_d ... is equivalent to in "$@" _E_x_i_t__S_t_a_t_u_s The exit status of a for command shall be the exit status of the last command that executes. If there are no items, the exit status shall be zero. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 271 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.9.4.2.1 for Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The format is shown with generous usage of s. See the grammar in 3.10 for a precise description of where s and semicolons can be interchanged. Some historical implementations support { and } as substitutes for do and done. The working group chose to omit them, even as an obsolescent feature. (Note that these substitutes were only for the for command; the while and until commands could not use them historically, because they 1 are followed by compound-lists that may contain {...} grouping commands 1 themselves 1 The reserved word pair do ... done was selected rather than do ... od (which would have matched the spirit of if ... fi and case ... esac) because od is a commonly-used utility name and this would have been an unacceptable choice. END_RATIONALE 3.9.4.3 case Conditional Construct The conditional construct case shall execute the _c_o_m_p_o_u_n_d-_l_i_s_t corresponding to the first one of several _p_a_t_t_e_r_n_s (see 3.13) that is matched by the string resulting from the tilde expansion, parameter expansion, command substitution, and arithmetic expansion and quote removal of the given word. The reserved word in shall denote the beginning of the patterns to be matched. Multiple patterns with the same _c_o_m_p_o_u_n_d-_l_i_s_t are delimited by the | symbol. The control operator ) terminates a list of patterns corresponding to a given action. The _c_o_m_p_o_u_n_d-_l_i_s_t for each list of patterns is terminated with ;;. The case construct terminates with the reserved word esac (case reversed). The format for the case construct is as follows. case _w_o_r_d in [(]_p_a_t_t_e_r_n_1) _c_o_m_p_o_u_n_d-_l_i_s_t;; 2 [(]_p_a_t_t_e_r_n_2|_p_a_t_t_e_r_n_3)_c_o_m_p_o_u_n_d-_l_i_s_t;; 2 ... esac The ;; is optional for the last _c_o_m_p_o_u_n_d-_l_i_s_t. Each pattern in a pattern list shall be expanded and compared against the expansion of _w_o_r_d. After the first match, no more patterns shall be expanded, and the _c_o_m_p_o_u_n_d-_l_i_s_t shall be executed. The order of expansion and comparing of patterns in a multiple pattern list is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 272 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E_x_i_t__S_t_a_t_u_s The exit status of case is zero if no patterns are matched. Otherwise, the exit status shall be the exit status of the last command executed in the _c_o_m_p_o_u_n_d-_l_i_s_t. BEGIN_RATIONALE 3.9.4.3.1 case Conditional Construct Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) An optional open-parenthesis before _p_a_t_t_e_r_n was added to allow numerous 2 historical KornShell scripts to conform. At one time, using the leading 2 parenthesis was required if the case statement were to be embedded within 2 a $( ) command substitution; this is no longer the case with the POSIX 2 shell. Nevertheless, many existing scripts use the open-parenthesis, if 2 only because it makes matching-parenthesis searching easier in vi and 2 other editors. This is a relatively simple implementation change that is 2 fully upward compatible for all scripts. 2 Consideration was given to requiring break inside the _c_o_m_p_o_u_n_d-_l_i_s_t to prevent falling through to the next pattern action list. This was rejected as being nonexisting practice. An interesting undocumented feature of the KornShell is that using ;& instead of ;; as a terminator causes the exact opposite behavior--the flow of control continues with the next _c_o_m_p_o_u_n_d-_l_i_s_t. The pattern "*", given as the last pattern in a case construct, is equivalent to the default case in a C-language switch statement The grammar shows that reserved words can be used as patterns, even if one is the first word on a line. Obviously, the reserved word esac cannot be used in this manner. END_RATIONALE 3.9.4.4 if Conditional Construct The if command shall execute a _c_o_m_p_o_u_n_d-_l_i_s_t and use its exit status to determine whether to execute another _c_o_m_p_o_u_n_d-_l_i_s_t. The format for the if construct is as follows. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 273 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX if _c_o_m_p_o_u_n_d-_l_i_s_t _t_h_e_n _c_o_m_p_o_u_n_d-_l_i_s_t [elif _c_o_m_p_o_u_n_d-_l_i_s_t _t_h_e_n _c_o_m_p_o_u_n_d-_l_i_s_t] ... [else _c_o_m_p_o_u_n_d-_l_i_s_t] fi The if _c_o_m_p_o_u_n_d-_l_i_s_t is executed; if its exit status is zero, the then _c_o_m_p_o_u_n_d-_l_i_s_t is executed and the command shall complete. Otherwise, each elif _c_o_m_p_o_u_n_d-_l_i_s_t is executed, in turn, and if its exit status is zero, the then _c_o_m_p_o_u_n_d-_l_i_s_t is executed and the command shall complete. Otherwise, the else _c_o_m_p_o_u_n_d-_l_i_s_t is executed. _E_x_i_t__S_t_a_t_u_s The exit status of the if command shall be the exit status of the then or else _c_o_m_p_o_u_n_d-_l_i_s_t that was executed, or zero, if none was executed. BEGIN_RATIONALE 3.9.4.4.1 if Conditional Construct Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The precise format for the command syntax is described in 3.10. END_RATIONALE 3.9.4.5 while Loop The while loop continuously shall execute one _c_o_m_p_o_u_n_d-_l_i_s_t as long as another _c_o_m_p_o_u_n_d-_l_i_s_t has a zero exit status. The format of the while loop is as follows while _c_o_m_p_o_u_n_d-_l_i_s_t-_1 _d_o _c_o_m_p_o_u_n_d-_l_i_s_t-_2 _d_o_n_e The _c_o_m_p_o_u_n_d-_l_i_s_t-_1 shall be executed, and if it has a nonzero exit status, the while command shall complete. Otherwise, the _c_o_m_p_o_u_n_d-_l_i_s_t-_2 shall be executed, and the process shall repeat. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 274 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E_x_i_t__S_t_a_t_u_s The exit status of the while loop shall be the exit status of the last _c_o_m_p_o_u_n_d-_l_i_s_t-_2 executed, or zero if none was executed. BEGIN_RATIONALE 3.9.4.5.1 while Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The precise format for the command syntax is described in 3.10. END_RATIONALE 3.9.4.6 until Loop The until loop continuously shall execute one _c_o_m_p_o_u_n_d-_l_i_s_t as long as another _c_o_m_p_o_u_n_d-_l_i_s_t has a nonzero exit status. The format of the until loop is as follows until _c_o_m_p_o_u_n_d-_l_i_s_t-_1 _d_o _c_o_m_p_o_u_n_d-_l_i_s_t-_2 _d_o_n_e The _c_o_m_p_o_u_n_d-_l_i_s_t-_1 shall be executed, and if it has a zero exit status, the until command shall complete. Otherwise, the _c_o_m_p_o_u_n_d-_l_i_s_t-_2 shall be executed, and the process shall repeat. _E_x_i_t__S_t_a_t_u_s The exit status of the until loop shall be the exit status of the last _c_o_m_p_o_u_n_d-_l_i_s_t-_2 executed, or zero if none was executed. BEGIN_RATIONALE 3.9.4.6.1 until Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The precise format for the command syntax is described in 3.10. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 275 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.9.5 Function Definition Command A function is a user-defined name that is used as a simple command to call a compound command with new positional parameters. A function is defined with a _f_u_n_c_t_i_o_n _d_e_f_i_n_i_t_i_o_n _c_o_m_m_a_n_d. The format of a function definition command is as follows: _f_n_a_m_e() _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d [_i_o-_r_e_d_i_r_e_c_t ...] The function is named _f_n_a_m_e; it shall be a name (see 3.1.5). An 1 implementation may allow other characters in a function name as an 1 extension. The implementation shall maintain separate namespaces for 1 functions and variables. The argument _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d represents a compound command, as described in 3.9.4. When the function is declared, none of the expansions in 3.6 shall be performed on the text in _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d or _i_o-_r_e_d_i_r_e_c_t; all expansions shall be performed as normal each time the function is called. Similarly, the optional _i_o-_r_e_d_i_r_e_c_t redirections and any variable assignments within _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d shall be performed during the execution of the function itself, not the function definition. See 3.8.1 for the consequences of failures of these operations on interactive and noninteractive shells. When a function is executed, it shall have the syntax-error and variable-assignment properties described for special built-in utilities, in the enumerated list at the beginning of 3.14. The _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d shall be executed whenever the function name is specified as the name of a simple command (see 3.9.1.1). The operands to the command temporarily shall become the positional parameters during the execution of the _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d; the special parameter # shall also be changed to reflect the number of operands. The special parameter 0 shall be unchanged. When the function completes, the values of the positional parameters and the special parameter # shall be restored to the values they had before the function was executed. If the special built-in return is executed in the _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d, the function shall complete and execution shall resume with the next command after the function call. _E_x_i_t__S_t_a_t_u_s The exit status of a function definition shall be zero if the function was declared successfully; otherwise, it shall be greater than zero. The exit status of a function invocation shall be the exit status of the last command executed by the function. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 276 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 BEGIN_RATIONALE 3.9.5.1 Function Definition Command Rationale (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The description of functions in Draft 8 was based on the notion that functions should behave like miniature shell scripts; that is, except for sharing variables, most elements of an execution environment should behave as if it were a new execution environment, and changes to these should be local to the function. For example, traps and options should be reset on entry to the function, and any changes to them don't affect the traps or options of the caller. There were numerous objections to this basic idea, and the opponents asserted that functions were intended to be a convenient mechanism for grouping commonly executed commands that were to be executed in the current execution environment, similar to the execution of the dot special built-in. Opponents also pointed out that the functions described in Draft 8 did not scope everything a new shell script would anyway, such as the current working directory, or umask, but instead picked a few select properties. The basic argument was that if one wanted scoping of the execution environment, the mechanism already exists: put the commands in a new shell script and call it. All traditional shells that implemented functions, other than the KornShell, have implemented functions that operate in the current execution environment. Because of this, Draft 9 removed any local scoping of traps or options. Local variables within a function were considered and included in Draft 9 (controlled by the special built-in local), but were removed because they do not fit the simple model developed for the scoping of functions and there was some opposition to adding yet another new special built-in from outside existing practice. Implementations should reserve the identifier local (as well as typeset, as used in the KornShell) in case this local variable mechanism is adopted in a future version of POSIX.2. A separate issue from the execution environment of a function is the availability of that function to child shells. A few objectors, including the author of the original Version 7 UNIX system shell, maintained that just as a variable can be shared with child shells by exporting it, so should a function--and so this capability has been added to the standard. In previous drafts, the export command therefore had a -f flag for exporting functions. Functions that were exported were to be put into the environment as _n_a_m_e()=_v_a_l_u_e pairs, and upon invocation, the shell would scan the environment for these, and automatically define these functions. This facility received a lot of balloting opposition and was removed from Draft 11. Some of the arguments against exportable functions were: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.9 Shell Commands 277 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - There was little existing practice. The Ninth Edition shell provided them, but there was controversy over how well it worked. - There are numerous security problems associated with functions appearing in a script's environment and overriding standard utilities or the application's own utilities. - There was controversy over requiring make to import functions, where it has historically used an _e_x_e_c function for many of its command line executions. - Functions can be big and the environment is of a limited size. (The counter-argument was that functions are no different than variables in terms of size: there can be big ones, and there can be small ones--and just as one does not export huge variables, one does not export huge functions. However, this insight might be lost on the average shell-function writer, who typically writes much larger functions than variables.) As far as can be determined, the functions in POSIX.2 match those in System V. The KornShell has two methods of defining functions: function _f_n_a_m_e { _c_o_m_p_o_u_n_d-_l_i_s_t } and _f_n_a_m_e() { _c_o_m_p_o_u_n_d-_l_i_s_t } The latter uses the same definition as POSIX.2, but differs in semantics, as described previously. A future edition of the KornShell is planned to align the latter syntax with POSIX and keep the former as-is. The name space for functions is limited to that of a _n_a_m_e because of 1 historical practice. Complications in defining the syntactic rules for 1 the function definition command and in dealing with known extensions such 1 as the KornShell's @() prevented the name space from being widened to a 1 _w_o_r_d, as requested by some balloters. Using functions to support 1 synonyms such as the C-shell's !! and % is thus disallowed to portable 1 applications, but acceptable as an extension. For interactive users, the 1 aliasing facilities in the UPE should be adequate for this purpose. It 1 is recognized that the name space for utilities in the file system is 1 wider than that currently supported for functions, if the portable 1 filename character set guidelines are ignored, but it did not seem useful 1 to mandate extensions in systems for so little benefit to portable 1 applications. 1 The () in the function definition command consists of two operators. Therefore, intermixing _s with the _f_n_a_m_e, (, and ) is allowed, but unnecessary. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 278 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 An example of how a function definition can be used wherever a simple command is allowed: # If variable i is equal to "yes", # define function foo to be ls -l # [ X$i = Xyes ] && foo() { ls -l } END_RATIONALE 3.10 Shell Grammar The following grammar describes the Shell Command Language. Any discrepancies found between this grammar and the preceding description shall be resolved in favor of this clause. 3.10.1 Shell Grammar Lexical Conventions The input language to the shell must be first recognized at the character level. The resulting tokens shall be classified by their immediate context according to the following rules (applied in order). These rules are used to determine what a ``token'' that is subject to parsing at the token level is. The rules for token recognition in 3.3 shall apply. (1) A shall be returned as the token identifier NEWLINE. (2) If the token is an operator, the token identifier for that operator shall result. (3) If the string consists solely of digits and the delimiter character is one of < or >, the token identifier IO_NUMBER shall be returned. (4) Otherwise, the token identifier TOKEN shall result. Further distinction on TOKEN is context-dependent. It may be that the same TOKEN yields WORD, a NAME, an ASSIGNMENT, or one of the reserved words below, dependent upon the context. Some of the productions in the grammar below are annotated with a rule number from the following list. When a TOKEN is seen where one of those annotated productions could be used to reduce the symbol, the applicable rule shall be applied to convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar. The reduction shall then proceed based upon the token identifier type yielded by the rule applied. When more than one rule applies, the highest numbered rule shall apply Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.10 Shell Grammar 279 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (which in turn may refer to another rule). [Note that except in rule (7), the presence of an = in the token has no effect.] The WORD tokens shall have the word expansion rules applied to them immediately before the associated command is executed, not at the time the command is parsed. 3.10.2 Shell Grammar Rules (1) [Command Name] When the TOKEN is exactly a reserved word, the token identifier for that reserved word shall result. Otherwise, the token WORD shall be returned. Also, if the parser is in any state where 1 only a reserved word could be the next correct token, proceed as 1 above. 1 NOTE: Because at this point quote marks are retained in the token, quoted strings cannot be recognized as reserved words. This rule also implies that reserved words will not be recognized except in certain positions in the input, such as after a or semicolon; the grammar presumes that if the reserved word is intended, it will be properly delimited by the user, and does not attempt to reflect that requirement directly. Also note that line joining is done before tokenization, as described in 3.2.1, so escaped newlines are already removed at this point. NOTE: Rule (1) is not directly referenced in the grammar, but 1 is referred to by other rules, or applies globally. 1 (2) [Redirection to/from filename] The expansions specified in 3.7 shall occur. As specified there, exactly one field can result (or the result is 1 unspecified), and there are additional requirements on pathname expansion. (3) [Redirection from here-document] Quote removal [3.7.4]. shall be applied to the word to 1 determine the delimiter that will be used to find the end of the 1 here-document that begins after the next . 1 (4) [Case statement termination] When the TOKEN is exactly the reserved word Esac, the token identifier for Esac shall result. Otherwise, the token WORD shall be returned. (5) [NAME in for] When the TOKEN meets the requirements for a name [3.1.5], the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 280 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 token identifier NAME shall result. Otherwise, the token WORD shall be returned. (6) [Third word of for and case] When the TOKEN is exactly the reserved word In, the token identifier for In shall result. Otherwise, the token WORD shall be returned. (7) [Assignment preceding command name] 1 (a) [When the first word] If the TOKEN does not contain the character =, rule (1) shall be applied. Otherwise, apply (7)(b). (b) [Not the first word] If the TOKEN contains the equals-sign character: - If it begins with =, the token WORD shall be returned. - If all the characters preceding = form a valid name [3.1.5], the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.) - Otherwise, it is unspecified whether it is ASSIGNMENT_WORD or WORD that is returned. Assignment to the NAME shall occur as specified in 3.9.1. (8) [NAME in function] When the TOKEN is exactly a reserved word, the token identifier for that reserved word shall result. Otherwise, when the TOKEN meets the requirements for a name [3.1.5], the token identifier NAME shall result. Otherwise, rule (7) shall apply. (9) [Body of function] Word expansion and assignment shall never occur, even when required by the rules above, when this rule is being parsed. Each TOKEN that might either be expanded or have assignment applied to it shall instead be returned as a single WORD consisting only of characters that are exactly the token described in 3.3. /* ------------------------------------------------------- The grammar symbols ------------------------------------------------------- */ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.10 Shell Grammar 281 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX %token WORD %token ASSIGNMENT_WORD %token NAME %token NEWLINE %token IO_NUMBER /* The following are the operators mentioned above. */ %token AND_IF OR_IF DSEMI /* '&&' '||' ';;' */ %token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH /* '<<' '>>' '<&' '>&' '<>' '<<-' */ %token CLOBBER /* '>|' */ /* The following are the reserved words */ %token If Then Else Elif Fi Do Done /* 'if' 'then' 'else' 'elif' 'fi' 'do' 'done' */ %token Case Esac While Until For /* 'case' 'esac' 'while' 'until' 'for' */ /* These are reserved words, not operator tokens, and are recognized when reserved words are recognized. */ %token Lbrace Rbrace Bang /* '{' '}' '!' */ %token In /* 'in' */ /* ------------------------------------------------------- The Grammar ------------------------------------------------------- */ %start complete_command %% complete_command : list separator | list 1 ; list : list separator_op and_or Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 282 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 | and_or ; and_or : pipeline | and_or AND_IF linebreak pipeline | and_or OR_IF linebreak pipeline ; pipeline : pipe_sequence | Bang pipe_sequence ; pipe_sequence : command | pipe_sequence '|' linebreak command ; command : simple_command | compound_command | compound_command redirect_list | function_definition ; compound_command : brace_group | subshell | for_clause | case_clause | if_clause | while_clause | until_clause ; subshell : '(' compound_list ')' ; compound_list : term | newline_list term | term separator | newline_list term separator ; term : term separator and_or | and_or ; for_clause : For name do_group | For name In wordlist sequential_sep do_group ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.10 Shell Grammar 283 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX name : NAME /* Apply rule (5) */ 2 ; in : In /* Apply rule (6) */ ; wordlist : wordlist WORD | WORD ; case_clause : Case WORD In linebreak case_list Esac | Case WORD In linebreak Esac ; case_list : case_list case_item | case_item ; case_item : pattern ')' linebreak DSEMI linebreak | pattern ')' compound_list DSEMI linebreak | '(' pattern ')' linebreak DSEMI linebreak 2 | '(' pattern ')' compound_list DSEMI linebreak 2 ; pattern : WORD /* Apply rule (4) */ | pattern '|' WORD /* Do not apply rule (4) */ 1 ; if_clause : If compound_list Then compound_list else_part Fi | If compound_list Then compound_list Fi ; else_part : Elif compound_list Then else_part | Else compound_list ; while_clause : While compound_list do_group ; until_clause : Until compound_list do_group ; function_definition : fname '(' ')' linebreak function_body ; function_body : compound_command /* Apply rule (9) */ | compound_command redirect_list /* Apply rule (9) */ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 284 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 ; fname : NAME /* Apply rule (8) */ 2 ; brace_group : Lbrace compound_list Rbrace ; do_group : Do compound_list Done ; simple_command : cmd_prefix cmd_word cmd_suffix | cmd_prefix cmd_word | cmd_prefix | cmd_name cmd_suffix | cmd_name ; cmd_name : WORD /* Apply rule (7)(a) */ ; cmd_word : WORD /* Apply rule (7)(b) */ ; cmd_prefix : io_redirect | cmd_prefix io_redirect | ASSIGNMENT_WORD | cmd_prefix ASSIGNMENT_WORD ; cmd_suffix : io_redirect | cmd_suffix io_redirect | WORD | cmd_suffix WORD ; redirect_list : io_redirect | redirect_list io_redirect ; io_redirect : io_file | IO_NUMBER io_file | io_here | IO_NUMBER io_here ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.10 Shell Grammar 285 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX io_file : '<' filename | LESSAND filename | '>' filename | GREATAND filename | DGREAT filename | LESSGREAT filename | CLOBBER filename ; filename : WORD /* Apply rule (2) */ ; io_here : DLESS here_end | DLESSDASH here_end ; here_end : WORD /* Apply rule (3) */ ; newline_list : NEWLINE | newline_list NEWLINE ; linebreak : newline_list | /* empty */ ; separator_op : '&' | ';' ; separator : separator_op linebreak | newline_list ; sequential_sep : ';' linebreak | newline_list ; BEGIN_RATIONALE 3.10.3 Shell Grammar Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) There are several subtle aspects of this grammar where conventional usage implies rules about the grammar that in fact are not true. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 286 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 For compound_list, only the forms that end in a separator allow a reserved word to be recognized, so usually only a separator can be used 1 where a compound list precedes a reserved word (such as Then, Else, Do, and Rbrace. Explicitly requiring a separator would disallow such valid (if rare) statements as: if (false) then (echo x) else (echo y) fi See the NOTE under special grammar rule (1). Concerning the third sentence of rule (1) (``Also, if the parser ...''): 1 - This sentence applies rather narrowly: when a compound list is 1 terminated by some clear delimiter (such as the closing fi of an 1 inner if_clause) then it would apply; where the compound list might 1 continue (as in after a ;), rule (7a) [and consequently the first 1 sentence of rule (1)] would apply. In many instances the two 1 conditions are identical, but this part of rule (1) does not give 1 license to treating a WORD as a reserved words unless it is in a 1 place where a reserved word must appear. 1 - The statement is equivalent to requiring that when the LR(1) 2 lookahead set contains exactly a reserved word, it must be 2 recognized if it is present. (Here ``LR(1)'' refers to the 2 theoretical concepts, not to any real parser generator.) 2 For example, in the construct below, and when the parser is at the 2 point marked with ^, the only next legal token is then (this 2 follows directly from the grammar rules). 2 if if....fi then .... fi 2 ^ 2 At that point, the then must be recognized as a reserved word. 2 (Depending on the actual parser generator actually used, ``extra'' 2 reserved words may be in some lookahead sets. It does not really 2 matter if they are recognized, or even if any possible reserved 2 word is recognized in that state, because if it is recognized and 2 is not in the (theoretical) LR(1) lookahead set, an error will 2 ultimately be detected. In the example above, if some other 2 reserved word (e.g., while) is also recognized, an error will occur 2 later. 2 This is approximately equivalent to saying that reserved words are 2 recognized after other reserved words (because it is after a 2 reserved word that this condition will occur), but avoids the 2 ``except for...'' list that would be required for case, for, etc. 2 (Reserved words are of course recognized anywhere a simple_command 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.10 Shell Grammar 287 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX can appear, as well. Other rules take care of the special cases of 2 nonrecognition, such as rule (4) for case statements.) 2 Note that the body of here-documents are handled by Token Recognition (see 3.3) and do not appear in the grammar directly. (However, the here-document I/O redirection operator is handled as part of the grammar.) The start symbol of the grammar (complete_command) represents either input from the command line or a shell script. It is repeatedly applied by the interpreter to its input, and represents a single ``chunk'' of that input as seen by the interpreter. 1 The processing of here-documents is handled as part of token recognition (see 3.3) rather than as part of the grammar. END_RATIONALE 3.11 Signals and Error Handling When a command is in an asynchronous list, the shell shall prevent SIGQUIT and SIGINT signals from the keyboard from interrupting the command. Otherwise, signals shall have the values inherited by the shell from its parent (see also 3.14.13). When a signal for which a trap has been set is received while the shell 1 is waiting for the completion of a utility executing a foreground 1 command, the trap associated with that signal shall not be executed until 1 after the foreground command has completed. When the shell is waiting, 1 by means of the wait utility, for asynchronous commands to complete, the 1 reception of a signal for which a trap has been set shall cause the wait 1 utility to return immediately with an exit status >128, immediately after 1 which the trap associated with that signal shall be taken. 1 If multiple signals are pending for the shell for which there are associated trap actions (see 3.14.13), the order of execution of trap actions is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 288 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.12 Shell Execution Environment A shell execution environment consists of the following: - Open files inherited upon invocation of the shell, plus open files controlled by exec. - Working Directory as set by cd (see 4.5). - File Creation Mask set by umask (see 4.67). - Current traps set by trap (see 3.14.13). - Shell parameters that are set by variable assignment (see set in 3.14.11) or from the POSIX.1 {8} environment inherited by the shell when it begins (see export in 3.14.8). - Shell functions (see 3.9.5.) - Options turned on at invocation or by set. - Process IDs of the last commands in asynchronous lists known to 1 this shell environment; see 3.9.3.1. 1 Utilities other than the special built-ins (see 3.14) shall be invoked in a separate environment that consists of the following. The initial value of these objects shall be the same as that for the parent shell, except as noted below. - Open files inherited on invocation of the shell, open files controlled by the exec special built-in (see 3.14.6), plus any modifications and additions specified by any redirections to the utility. - Current working directory. - File creation mask. - If the utility is a shell script, traps caught by the shell shall be set to the default values and traps ignored by the shell shall be set to be ignored by the utility. If the utility is not a shell script, the trap actions (default or ignore) shall be mapped into the appropriate signal handling actions for the utility. - Variables with the export attribute, along with those explicitly exported for the duration of the command, shall be passed to the utility as POSIX.1 {8} environment variables. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.12 Shell Execution Environment 289 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The environment of the shell process shall not be changed by the utility unless explicitly specified by the utility description (for example, cd and umask). A subshell environment shall be created as a duplicate of the shell environment, except that signal traps set by that shell environment shall 1 be set to the default values. Changes made to the subshell environment 1 shall not affect the shell environment. Command substitution, commands that are grouped with parentheses, and asynchronous lists shall be executed in a subshell environment. Additionally, each command of a multicommand pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment. All other commands shall be executed in the current shell environment. BEGIN_RATIONALE 3.12.0.1 Shell Execution Environment Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Some systems have implemented the last stage of a pipeline in the current environment so that commands such as _c_o_m_m_a_n_d | read foo set variable foo in the current environment. It was decided to allow this extension, but not require it; therefore, a shell programmer should consider a pipeline to be in a subshell environment, but not depend on it. The previous description of execution environment failed to mention that each command in a multiple command pipeline could be in a subshell execution environment. For compatibility with some existing shells, the wording was phrased to allow an implementation to place any or all commands of a pipeline in the current environment. However, this means that a POSIX application must assume each command is in a subshell environment, but not depend on it. The wording about shell scripts is meant to convey the fact that describing ``trap actions'' can only be understood in the context of the shell command language. Outside this context, such as in a C-language program, signals are the operative condition, not traps. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 290 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.13 Pattern Matching Notation The pattern matching notation described in this clause is used to specify patterns for matching strings in the shell. Historically, pattern matching notation is related to, but slightly different from, the regular expression notation described in 2.8. For this reason, the description of the rules for this pattern matching notation are based on the description of regular expression notation. BEGIN_RATIONALE 3.13.0.1 Pattern Matching Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Pattern matching is a simpler concept and has a simpler syntax than regular expressions, as the former is generally used for the manipulation of file names, which are relatively simple collections of characters, while the latter is generally used to manipulate arbitrary text strings of potentially greater complexity. However, some of the basic concepts are the same, so this clause points liberally to the detailed descriptions in 2.8. END_RATIONALE 3.13.1 Patterns Matching a Single Character The following _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r match a single character: _o_r_d_i_n_a_r_y _c_h_a_r_a_c_t_e_r_s, _s_p_e_c_i_a_l _p_a_t_t_e_r_n _c_h_a_r_a_c_t_e_r_s, and _p_a_t_t_e_r_n _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n_s. The pattern bracket expression also shall match a single collating element. An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those 1 special shell characters in 3.2 that require quoting, and the following 1 three special pattern characters. Matching shall be based on the bit 1 pattern used for encoding the character, not on the graphic 1 representation of the character. If any character (ordinary, shell 1 special, or pattern special) is quoted, that pattern shall match the 1 character itself. The shell special characters always require quoting. 1 When unquoted and outside a bracket expression, the following three 1 characters shall have special meaning in the specification of patterns: 1 ? A question-mark is a pattern that shall match any character. * An asterisk is a pattern that shall match multiple characters, as described in 3.13.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.13 Pattern Matching Notation 291 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX [ The open bracket shall introduce a pattern bracket expression. The description of basic regular expression bracket expressions in 2.8.3.2 also shall apply to the pattern bracket expression, except that the exclamation-mark character (!) shall replace the circumflex character (^) in its role in a _n_o_n_m_a_t_c_h_i_n_g _l_i_s_t in the regular expression notation. A bracket expression starting with an unquoted circumflex character produces unspecified results. When pattern matching is used where shell quote removal is not performed 1 [such as in the argument to the find -name primary when find is being 1 called using an _e_x_e_c function, or in the _p_a_t_t_e_r_n argument to the 1 _f_n_m_a_t_c_h() function], special characters can be escaped to remove their 1 special meaning by preceding them with a . This escaping 1 shall be discarded. The sequence \\ shall represent one 1 literal backslash. All of the requirements and effects of quoting on 1 ordinary, shell special, and special pattern characters shall apply to 1 escaping in this context. 1 BEGIN_RATIONALE 1 3.13.1.1 Patterns Matching a Single Character Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Both ``quoting'' and ``escaping'' are described here because pattern 1 matching must work in three separate circumstances: 1 - Calling directly upon the shell, such as in pathname expansion or 1 in a case statement. All of the following will match the string or 1 file abc: abc, "abc", a"b"c, a\bc, a[b]c, a["b"]c, a[\b]c, a?c, 1 a*c. The following will not: "a?c", a\*c, a\[b]c, a["\b"]c. 1 - Calling a utility or function without going through a shell, as 1 described for find and _f_n_m_a_t_c_h(). 1 - Calling utilities such as find or pax through the shell command 1 line. (Although find and pax are the only instances of this in the 1 standard utilities, describing it globally here is useful for 1 future utilities that may use pattern matching internally.) In 1 this case, shell quote removal is performed before the utility sees 1 the argument. For example, in 1 find /bin -name "e\c[\h]o" -print 1 after quote removal, the backslashes are presented to find and it 1 treats them as escape characters. Both precede ordinary 1 characters, so the c and h represent themselves and echo would be 1 found on many historical systems (that have it in /bin). To find a 1 filename that contained shell special characters or pattern 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 292 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 characters, both quoting and escaping are required, such as 1 pax -r ... "*a\(\?" 1 to extract a filename ending with ``a(?''. 1 Conforming applications are required to quote or escape the shell special 1 characters (called ``metacharacters'' in some historical documentation). 1 If used without this protection, syntax errors can result or 1 implementation extensions can be triggered. For example, the KornShell 1 supports a series of extensions based on parentheses in patterns. 1 The restriction on circumflex in a bracket expression is to allow implementations that support pattern matching using circumflex as the negation character in addition to the exclamation-mark. 1 END_RATIONALE 1 3.13.2 Patterns Matching Multiple Characters The following rules are used to construct _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _m_u_l_t_i_p_l_e _c_h_a_r_a_c_t_e_r_s from _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r: (1) The asterisk (*) is a pattern that shall match any string, including the null string. (2) The concatenation of _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r is a valid pattern that shall match the concatenation of the single characters or collating elements matched by each of the concatenated patterns. (3) The concatenation of one or more _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r with one or more asterisks is a valid pattern. In such patterns, each asterisk shall match a string of zero or more characters, matching the greatest possible number of characters that still allows the remainder of the pattern to match the string. BEGIN_RATIONALE 3.13.2.1 Patterns Matching Multiple Characters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Since each asterisk matches ``zero or more'' occurrences, the patterns a*b and a**b have identical functionality. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.13 Pattern Matching Notation 293 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E_x_a_m_p_l_e_s: a[bc] matches the strings ab and ac. a*d matches the strings ad, abd, and abcd, but not the string abc. a*d* matches the strings ad, abcd, abcdef, aaaad, and adddd; *a*d matches the strings ad, abcd, efabcd, aaaad, and adddd. END_RATIONALE 3.13.3 Patterns Used for Filename Expansion The rules described so far in 3.13.1 and 3.13.2 are qualified by the following rules that apply when pattern matching notation is used for filename expansion. (1) The slash character in a pathname shall be explicitly matched by using one or more slashes in the pattern; it cannot be matched by the asterisk or question-mark special characters or by a bracket expression. Slashes in the pattern are identified before bracket expressions; thus, a slash cannot be included in a pattern bracket expression used for filename expansion. (2) If a filename begins with a period (.), the period shall be explicitly matched by using a period as the first character of the pattern or immediately following a slash character. The leading period shall not be matched by: - The asterisk or question-mark special characters, or - A bracket expression containing a nonmatching list (such as [!a]), a range expression (such as [%-0]), or a character class expression (such as [[:punct:]]). It is unspecified whether an explicit period in a bracket expression matching list (such as [.abc]) can match a leading period in a filename. (3) Specified patterns are matched against existing filenames and pathnames, as appropriate. Each component that contains a 2 pattern character requires read permission in the directory 2 containing that component. Any component that does not contain 2 a pattern character requires search permission. For example, 2 given the pattern 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 294 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 /foo/bar/x*/bam 2 search permission is needed for directory /foo, search and read 2 permissions are needed for directory bar, and search permission 2 is needed for each x* directory. If the pattern matches any 2 existing filenames or pathnames, the pattern shall be replaced with those filenames and pathnames, sorted according to the collating sequence in effect in the current locale. If the pattern contains an invalid bracket expression or does not match any existing filenames or pathnames, the pattern string shall be left unchanged. BEGIN_RATIONALE 3.13.3.1 Patterns Used for File Name Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The caveat about a slash within a bracket expression is derived from historical practice. The pattern a[b/c]d will not match such pathnames as abd or a/d. It will only match a pathname of literally a[b/c]d. Filenames beginning with a period historically have been specially protected from view on UNIX systems. A proposal to allow an explicit period in a bracket expression to match a leading period was considered; it is allowed as an implementation extension, but a conforming application cannot make use of it. If this extension becomes popular in the future, it will be considered for a future version of POSIX.2. Historical systems have varied in their permissions requirements. To 2 match f*/bar has required read permissions on the f* directories in the 2 System V shell, but this standard, the C-shell, and KornShell require 2 only search permissions. 2 END_RATIONALE 2 3.14 Special Built-in Utilities The following _s_p_e_c_i_a_l _b_u_i_l_t-_i_n utilities shall be supported in the shell command language. The output of each command, if any, shall be written to standard output, subject to the normal redirection and piping possible with all commands. The term _b_u_i_l_t-_i_n implies that the shell can execute the utility directly and does not need to search for it. An implementation can choose to make any utility a built-in; however, the special built-in utilities described here differ from regular built-in utilities in two respects: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 295 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (1) A syntax error in a special built-in utility may cause a shell executing that utility to abort, while a syntax error in a regular built-in utility shall not cause a shell executing that utility to abort. (See 3.8.1 for the consequences of errors on interactive and noninteractive shells.) If a special built-in utility encountering a syntax error does not abort the shell, its exit value shall be nonzero. (2) Variable assignments specified with special built-in utilities shall remain in effect after the built-in completes; this shall 1 not be the case with a regular built-in or other utility. 1 As described in 2.3, the special built-in utilities in this clause need not be provided in a manner accessible via the POSIX.1 {8} _e_x_e_c family of functions. Some of the special built-ins are described as conforming to the utility argument syntax guidelines in 2.10.2. For those that are not, the requirement in 2.11.3 that "--" be recognized as a first argument to be discarded does not apply and a conforming application shall not use that argument. 3.14.1 break - Exit from for, while, or until loop break [_n] Exit from the smallest enclosing for, while, or until loop, if any; or from the _nth enclosing loop if _n is specified. The value of _n is an 1 unsigned decimal integer _> 1. The default is equivalent to _n=1. If _n is greater than the number of enclosing loops, the last enclosing loop shall be exited from. Execution continues with the command immediately following the loop. _E_x_i_t__S_t_a_t_u_s 0 Successful completion. 2 >0 The _n value was not an unsigned decimal integer _> 1. 2 BEGIN_RATIONALE 3.14.1.1 break Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Example: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 296 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 for i in * do if test -d "$i" then break fi done Consideration was given to expanding the syntax of the break and continue to refer to a label associated with the appropriate loop, as a preferable alternative to the [_n] method. This new method was proposed late in the development of the standard and adequate consensus could not be formed to include it. However, POSIX.2 does reserve the namespace of command names ending with a colon. It is anticipated that a future implementation could take advantage of this and provide something like: outofloop: for i in a b c d e 1 do for j in 0 1 2 3 4 5 6 7 8 9 do if test -r "${i}${j}" then break outofloop fi done done and that this might be standardized after implementation experience is achieved. END_RATIONALE 3.14.2 colon - Null utility : [_a_r_g_u_m_e_n_t ...] This utility shall only expand command _a_r_g_u_m_e_n_ts. _E_x_i_t__S_t_a_t_u_s Zero. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 297 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 3.14.2.1 colon Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The colon (:), or null utility, is used when a command is needed, as in the then condition of an if command, but nothing is to be done by the command. Example: : ${X=abc} if false then : else echo $X fi abc As with any of the special built-ins, the null utility can also have variable assignments and redirections associated with it, such as: x=y : > z which sets variable x to the value y (so that it persists after the null utility ``completes'') and creates or truncates file z. END_RATIONALE 3.14.3 continue - Continue for, while, or until loop continue [_n] The continue utility shall return to the top of the smallest enclosing for, while, or until, loop, or to the top of the _nth enclosing loop, if _n is specified. This involves repeating the condition list of a while or until loop or performing the next assignment of a for loop, and reexecuting the loop if appropriate. The value of _n is a decimal integer _> 1. The default is equivalent to _n=1. If _n is greater than the number of enclosing loops, the last enclosing loop is used. _E_x_i_t__S_t_a_t_u_s 0 Successful completion. 2 >0 The _n value was not an unsigned decimal integer _> 1. 2 BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 298 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.14.3.1 continue Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Example: for i in * do if test -d "$i" then continue fi done END_RATIONALE 3.14.4 dot - Execute commands in current environment . _f_i_l_e The shell shall execute commands from the _f_i_l_e in the current environment. If _f_i_l_e does not contain a slash, the shell shall use the search path specified by PATH to find the directory containing _f_i_l_e. Unlike normal command search, however, the file searched for by the dot utility need not be executable. If no readable file is found, a noninteractive shell shall abort; an interactive shell shall write a diagnostic message to standard error, but this condition shall not be considered a syntax error. _E_x_i_t__S_t_a_t_u_s Returns the value of the last command executed, or a zero exit status if no command is executed. BEGIN_RATIONALE 3.14.4.1 dot Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Some older implementations searched the current directory for the _f_i_l_e, even if the value of PATH disallowed it. This behavior was omitted from POSIX.2 due to concerns about introducing the susceptibility to trojan horses that the user might be trying to avoid by leaving dot out of PATH. The KornShell version of dot takes optional arguments that are set to the 1 positional parameters. This is a valid extension that allows a dot 1 script to behave identically to a function. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 299 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Example: cat foobar foo=hello bar=world . foobar echo $foo $bar hello world END_RATIONALE 3.14.5 eval - Construct command by concatenating arguments eval [_a_r_g_u_m_e_n_t ...] The eval utility shall construct a command by concatenating _a_r_g_u_m_e_n_ts together, separating each with a . The constructed command shall be read and executed by the shell. _E_x_i_t__S_t_a_t_u_s If there are no _a_r_g_u_m_e_n_ts, or only null arguments, eval shall return a zero exit status; otherwise, it shall return the exit status of the command defined by the string of concatenated _a_r_g_u_m_e_n_ts separated by spaces. BEGIN_RATIONALE 3.14.5.1 eval Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Example: foo=10 x=foo y='$'$x echo $y $foo eval y='$'$x echo $y 10 END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 300 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.14.6 exec - Execute commands and open, close, and/or copy file descriptors exec [_c_o_m_m_a_n_d [_a_r_g_u_m_e_n_t ...]] The exec utility opens, closes, and/or copies file descriptors as specified by any redirections as part of the command. If exec is specified without _c_o_m_m_a_n_d or _a_r_g_u_m_e_n_t_s, and any file descriptors with numbers > 2 are opened with associated redirection statements, it is unspecified whether those file descriptors remain open when the shell invokes another utility. If exec is specified with _c_o_m_m_a_n_d, it shall replace the shell with _c_o_m_m_a_n_d without creating a new process. If _a_r_g_u_m_e_n_ts are specified, they are arguments to _c_o_m_m_a_n_d. Redirection shall affect the current shell execution environment. _E_x_i_t__S_t_a_t_u_s If _c_o_m_m_a_n_d is specified, exec shall not return to the shell; rather, the 2 exit status of the process shall be the exit status of the program 2 implementing _c_o_m_m_a_n_d, which overlaid the shell. If _c_o_m_m_a_n_d is not found, 2 the exit status shall be 127. If _c_o_m_m_a_n_d is found, but it is not an 1 executable utility, the exit status shall be 126. If a redirection error 1 occurs (see 3.8.1), the shell shall exit with a value in the range 1-125. 1 Otherwise, exec shall return a zero exit status. BEGIN_RATIONALE 3.14.6.1 exec Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Most historical implementations are not conformant in that foo=bar exec cmd does not pass foo to cmd. Earlier drafts stated that ``If specified without _c_o_m_m_a_n_d or _a_r_g_u_m_e_n_t, the shell sets to close-on-exec file numbers greater than 2 that are opened in this way, so that they will be closed when the shell invokes another program.'' This was based on the behavior of one version of the KornShell and was made unspecified when it was realized that some existing scripts relied on the more generally historical behavior (leaving all file descriptors open). Furthermore, since the application should have no cognizance of whether a new shell is simply _f_o_r_k()ed, rather than _e_x_e_c()ed, it could not consistently rely on the automatic closing behavior anyway. Scripts concerned that child shells could misuse open file descriptors can always close them explicitly, as shown Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 301 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX in one of the following examples. Examples: Open readfile as file descriptor 3 for reading: exec 3< readfile Open writefile as file descriptor 4 for writing: exec 4> writefile Make unit 5 a copy of unit 0: exec 5<&0 Close file unit 3: exec 3<&- Cat the file maggie by replacing the current shell with the cat utility: exec cat maggie END_RATIONALE 3.14.7 exit - Cause the shell to exit exit [_n] The exit utility shall cause the shell to exit with the exit status specified by the unsigned decimal integer _n. If _n is specified, but its 1 value is not between 0 and 255 inclusively, the exit status is undefined. 1 A trap on EXIT shall be executed before the shell terminates, except when the exit utility is invoked in that trap itself, in which case the shell shall exit immediately. _E_x_i_t__S_t_a_t_u_s The exit status shall be _n, if specified. Otherwise, the value shall be the exit value of the last command executed, or zero if no command was executed. When exit is executed in a trap action (see 3.14.13), the ``last command'' is considered to be the command that executed immediately preceding the trap action. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 302 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 3.14.7.1 exit Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) As explained in other clauses, certain exit status values have been 1 reserved for special uses and should be used by applications only for 1 those purposes: 1 126 A file to be executed was found, but it was not an executable 1 utility. 1 127 A utility to be executed was not found. 1 >128 A command was interrupted by a signal. 1 Examples: Exit with a _t_r_u_e value: exit 0 Exit with a _f_a_l_s_e value: exit 1 END_RATIONALE 3.14.8 export - Set export attribute for variables export _n_a_m_e[=_w_o_r_d]... export -p The shell shall give the export attribute to the variables corresponding to the specified _n_a_m_es, which shall cause them to be in the environment of subsequently executed commands. When -p is specified, export shall write to the standard output the names and values of all exported variables, in the following format: 1 "export %s=%s\n", <_n_a_m_e>, <_v_a_l_u_e> The shell shall format the output, including the proper use of quoting, so that it is suitable for re-input to the shell as commands that achieve the same exporting results. The export special built-in shall conform to the utility argument syntax guidelines described in 2.10.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 303 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E_x_i_t__S_t_a_t_u_s Zero. BEGIN_RATIONALE 3.14.8.1 export Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) When no arguments are given, the results are unspecified. Some historical shells use the no-argument case as the functional equivalent of what is required here with -p. This feature was left unspecified because it is not existing practice in all shells and some scripts may rely on the now-unspecified results on their implementations. Attempts to specify the -p output as the default case were unsuccessful in achieving consensus. The -p option was added to allow portable access to the values that can be saved and then later restored using, for instance, a dot script. Examples: Export PWD and HOME variables: export PWD HOME Set and export the PATH variable: export PATH=/local/bin:$PATH Save and restore all exported variables: export -p > _t_e_m_p-_f_i_l_e unset _a _l_o_t _o_f _v_a_r_i_a_b_l_e_s ... _p_r_o_c_e_s_s_i_n_g . _t_e_m_p-_f_i_l_e END_RATIONALE 3.14.9 readonly - Set read-only attribute for variables 1 readonly _n_a_m_e[=_w_o_r_d]... readonly -p The variables whose _n_a_m_es are specified shall be given the readonly attribute. The values of variables with the read-only attribute cannot be changed by subsequent assignment, nor can those variables be unset by the unset utility. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 304 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 When -p is specified, readonly shall write to the standard output the names and values of all read-only variables, in the following format: 1 "readonly %s=%s\n", <_n_a_m_e>, <_v_a_l_u_e> The shell shall format the output, including the proper use of quoting, so that it is suitable for re-input to the shell as commands that achieve the same attribute-setting results. The readonly special built-in shall conform to the utility argument syntax guidelines described in 2.10.2. _E_x_i_t__S_t_a_t_u_s Zero. BEGIN_RATIONALE 3.14.9.1 readonly Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Example: readonly HOME PWD Some versions of the shell exist that preserve the read-only attribute across separate invocations. POSIX.2 allows this behavior, but does not require it. See the rationale for export (3.14.8.1) for a description of the no- argument and -p output cases. In a previous draft, read-only functions were considered, but they were omitted as not being existing practice or particularly useful. Furthermore, functions must not be readonly across invocations to preclude _s_p_o_o_f_i_n_g (spoofing is the term for the practice of creating a program that acts like a well-known utility with the intent of subverting the user's real intent) of administrative or security-relevant (or -conscious) shell scripts. END_RATIONALE 3.14.10 return - Return from a function return [_n] The return utility shall cause the shell to stop executing the current function or dot script (see 3.14.4). If the shell is not currently executing a function or dot script, the results are unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 305 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E_x_i_t__S_t_a_t_u_s The value of the special parameter ? shall be set to _n, an unsigned decimal integer, or to the exit status of the last command executed if _n is not specified. If the value of _n is greater than 255, the results are undefined. When return is executed in a trap action (see 3.14.13), the ``last command'' is considered to be the command that executed immediately preceding the trap action. BEGIN_RATIONALE 3.14.10.1 return Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The behavior of return when not in a function or dot script differs between the System V shell and the KornShell. In the System V shell this is an error, whereas in the KornShell, the effect is the same as exit. The results of returning a number greater than 255 are undefined because of differing practices in the various historical implementations. Some shells AND out all but the low order 8 bits; others allow larger values, but not of unlimited size. See the discussion of appropriate exit status values in 3.14.7.1. 1 END_RATIONALE 1 3.14.11 set - Set/unset options and positional parameters set [-aCefnuvx] [_a_r_g_u_m_e_n_t ...] set [+aCefnuvx] [_a_r_g_u_m_e_n_t ...] set -- [_a_r_g_u_m_e_n_t ...] _O_b_s_o_l_e_s_c_e_n_t _v_e_r_s_i_o_n: set - [_a_r_g_u_m_e_n_t ...] If no options or _a_r_g_u_m_e_n_ts are specified, set shall write the names and values of all shell variables in the collation sequence of the current locale. Each _n_a_m_e shall start on a separate line, using the format: "%s=%s\n", <_n_a_m_e>, <_v_a_l_u_e> The _v_a_l_u_e string shall be written with appropriate quoting so that it is suitable for re-input to the shell, (re)setting, as far as possible, the 1 variables that are currently set. Readonly variables cannot be reset. 1 See the description of shell quoting in 3.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 306 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 When options are specified, they shall set or unset attributes of the shell, as described below. When _a_r_g_u_m_e_n_ts are specified, they shall cause positional parameters to be set or unset, as described below. Setting/unsetting attributes and positional parameters are not necessarily related actions, but they can be combined in a single invocation of set. The set utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that options can be specified with either a leading hyphen (meaning enable the option) or plus-sign (meaning disable it). The implementation shall support the options in the following list in both their hyphen and plus-sign forms. These options can also be specified as options to sh; see 4.56. -a When this option is on, the export attribute shall be set for each variable to which an assignment is performed. (See 3.1.15.) If the assignment precedes a utility name in a command, the export attributes shall not persist in 1 the current execution environment after the utility 1 completes, with the exception that preceding one of the 1 special built-in utilities shall cause the export attribute to persist after the built-in has completed. If the assignment does not precede a utility name in the command, or if the assignment is a result of the operation of the getopts or read utilities (see 4.27 and 4.52), the export attribute shall persist until the variable is unset. -C (Uppercase C.) Prevent existing files from being overwritten by the shell's > redirection operator (see 3.7.2); the >| redirection operator shall override this ``noclobber'' option for an individual file. -e When this option is on, if a simple command fails for any 1 of the reasons listed in 3.8.1 or returns an exit status 1 value >0, and is not part of the compound list following a 1 while, until, or if keyword, and is not a part of an AND 1 or OR list, and is not a pipeline preceded by the ! reserved word, then the shell immediately shall exit. -f The shell shall disable pathname expansion. -n The shell shall read commands but not execute them; this can be used to check for shell script syntax errors. An interactive shell may ignore this option. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 307 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -u The shell shall write a message to standard error when it tries to expand a variable that is not set and immediately exit. An interactive shell shall not exit. -v The shell shall write its input to standard error as it is read. -x The shell shall write to standard error a trace for each command after it expands the command and before it executes it. The default for all these options is off (unset) unless the shell was invoked with them on (see sh in 4.56). All the positional parameters shall be unset before any new values are assigned. The remaining arguments shall be assigned in order to the positional parameters. The special parameter # shall be set to reflect the number of positional parameters. The special argument "--" immediately following the set command name can be used to delimit the arguments if the first argument begins with + or -, or to prevent inadvertent listing of all shell variables when there are no arguments. The command set -- without _a_r_g_u_m_e_n_ts shall unset all positional parameters and set the special parameter # to zero. In the obsolescent version, the set command name followed by - with no other arguments shall turn off the -v and -x options without changing the positional parameters. The set command name followed by - with other arguments shall turn off the -v and -x options and assign the arguments to the positional parameters in order. _E_x_i_t__S_t_a_t_u_s Zero. BEGIN_RATIONALE 3.14.11.1 set Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The set -- form is listed specifically in the Synopsis even though this usage is implied by the utility syntax guidelines. The explanation of this feature removes any ambiguity about whether the set -- form might be misinterpreted as being equivalent to set without any options or arguments. The functionality of this form has been adopted from the KornShell. In System V, set -- only unsets parameters if there is at least one argument; the only way to unset all parameters is to use shift. Using the KornShell version should not affect System V scripts because there should be no reason to deliberately issue it without arguments; if it were issued as, say: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 308 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 set -- "$@" 1 and there were in fact no arguments resulting from $@, unsetting the 1 parameters would be a no-op anyway. The set + form in earlier drafts was omitted as being an unnecessary duplication of set alone and not widespread historical practice. The noclobber option was changed to -C from the set -o noclobber option in previous drafts. The set -o is used in the KornShell to accept word- length option names, duplicating many of the single-letter names. The noclobber option was changed to a single letter so that the historical $- paradigm would not be broken; see 3.5.2. The following set flags were intentionally omitted with the following rationale: -h This flag is related to command name hashing, which is not required for an implementation. It is primarily a performance issue, which is outside the scope of this standard. -k The -k flag was originally added by Bourne to make it easier for users of prerelease versions of the shell. In early versions of the Bourne shell the construct set name=value, had to be used to assign values to shell variables. The problem with -k is that the behavior affects parsing, virtually precluding writing any compilers. To explain the behavior of -k, it is necessary to describe the parsing algorithm, which is implementation defined. For example, set -k; echo name=value and set -k echo name=value behave differently. The interaction with functions is even more complex. What is more, the -k flag is never needed, since the command line could have been reordered. -t The -t flag is hard to specify and almost never used. The only known use could be done with here-documents. Moreover, the behavior with ksh and sh differ. The man page says that it exits after reading and executing one command. What is one command? If the input is date;date, sh executes both date commands, ksh does only the first. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 309 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Consideration was given to rewriting set to simplify its confusing syntax. A specific suggestion was that the unset utility should be used to unset options instead of using the non-_g_e_t_o_p_t()-able +_o_p_t_i_o_n syntax. However, the conclusion was reached that people were satisfied with the existing practice of using +_o_p_t_i_o_n and there was no compelling reason to modify such widespread existing practice. Examples: Write out all variables and their values: set Set $1, $2, and $3 and set $# to 3: set c a b Turn on the -x and -v options: set -xv Unset all positional parameters: set -- Set $1 to the value of x, even if x begins with - or +: set -- "$x" Set the positional parameters to the expansion of x, even if x expands with a leading - or +: set -- $x END_RATIONALE 3.14.12 shift - Shift positional parameters shift [_n] The positional parameters shall be shifted. Positional parameter 1 shall be assigned the value of parameter (1+_n), parameter 2 shall be assigned the value of parameter (2+_n), and so forth. The parameters represented by the numbers $# down to $#-_n+1 shall be unset, and the parameter # shall be updated to reflect the new number of positional parameters. The value _n shall be an unsigned decimal integer less than or equal to the value of the special parameter #. If _n is not given, it shall be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 310 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 assumed to be 1. If _n is 0, the positional and special parameters shall not be changed. _E_x_i_t__S_t_a_t_u_s The exit status shall be >0 if _n>$#; otherwise, it shall be zero. BEGIN_RATIONALE 3.14.12.1 shift Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Example: set a b c d e shift 2 echo $* c d e END_RATIONALE 3.14.13 trap - Trap signals trap [_a_c_t_i_o_n _c_o_n_d_i_t_i_o_n ...] If _a_c_t_i_o_n is -, the shell shall reset each _c_o_n_d_i_t_i_o_n to the default value. If _a_c_t_i_o_n is null (''), the shell shall ignore each of the specified _c_o_n_d_i_t_i_o_ns if they arise. Otherwise, the argument _a_c_t_i_o_n shall be read and executed by the shell when one of the corresponding conditions arises. The action of the trap shall override a previous action (either default action or one explicitly set). The value of $? after the trap action completes shall be the value it had before the trap was invoked. The condition can be EXIT, 0 (equivalent to EXIT), or a signal specified using a symbolic name, without the SIG prefix, as listed in Required 1 Signals and Job Control Signals (Table 3-1 and Table 3-2 in POSIX.1 {8}). (For example: HUP, INT, QUIT, TERM). Setting a trap for SIGKILL or SIGSTOP produces undefined results. The environment in which the shell executes a trap on EXIT shall be identical to the environment immediately after the last command executed before the trap on EXIT was taken. Each time the trap is invoked, the _a_c_t_i_o_n argument shall be processed in a manner equivalent to: eval "$action" Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 311 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Signals that were ignored on entry to a noninteractive shell cannot be trapped or reset, although no error need be reported when attempting to do so. An interactive shell may reset or catch signals ignored on entry. Traps shall remain in place for a given shell until explicitly changed with another trap command. The trap command with no arguments shall write to standard output a list of commands associated with each condition. The format is: "trap -- %s %s ...\n", <_a_c_t_i_o_n>, <_c_o_n_d_i_t_i_o_n> ... 1 The shell shall format the output, including the proper use of quoting, so that it is suitable for re-input to the shell as commands that achieve the same trapping results. An implementation may allow numeric signal numbers for the conditions as an extension, if and only if the following map of signal numbers to names is true: Signal Signal Signal Signal Number Name Number Name ______ _______ ______ _______ 1 SIGHUP 9 SIGKILL 2 SIGINT 14 SIGALRM 3 SIGQUIT 15 SIGTERM 6 SIGABRT Otherwise, it shall be an error for the application to use numeric signal numbers. The trap special built-in shall conform to the utility argument syntax guidelines described in 2.10.2. _E_x_i_t__S_t_a_t_u_s If the trap name or number is invalid, a nonzero exit status shall be returned; otherwise, zero shall be returned. For both interactive and noninteractive shells, invalid signal names or numbers shall not be considered a syntax error and shall not cause the shell to abort. BEGIN_RATIONALE 3.14.13.1 trap Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Implementations may permit lowercase signal names as an extension. 1 Implementations may also accept the names with the SIG prefix; no known 1 historical shell does so. The trap and kill utilities in POSIX.2 are now 1 consistent in their omission of the SIG prefix for signal names. Some 1 kill implementations do not allow the prefix and kill -l lists the 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 312 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 signals without prefixes. 1 As stated previously, when a subshell is entered, traps are set to the 1 default actions. This does not imply that the trap command cannot be 1 used within the subshell to set new traps. 1 Trapping SIGKILL or SIGSTOP is accepted by some historical implementations, but it does not work. Portable POSIX.2 applications cannot try it. The output format is not historical practice. Since the output of historical traps is not portable (because numeric signal values are not portable) and had to change to become so, an opportunity was taken to format the output in a way that a shell script could use to save and then later reuse a trap if it wanted. For example: save_traps=$(trap) ... eval "$save_traps" The KornShell uses an ERR trap that is triggered whenever set -e would cause an exit. This is allowable as an extension, but was not mandated, as other shells have not used it. The text about the environment for the EXIT trap invalidates the behavior of some historical versions of interactive shells which, e.g., close the standard input before executing a trap on 0. For example, in some historical interactive shell sessions the following trap on 0 would always print --: trap 'read foo; echo "-$foo-"' 0 Examples: Write out a list of all traps and actions: trap Set a trap so the logout utility in the HOME directory will execute when the shell terminates: trap '$HOME/logout' EXIT _o_r trap '$HOME/logout' 0 Unset traps on INT, QUIT, TERM, and EXIT: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 313 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX trap - INT QUIT TERM EXIT END_RATIONALE 3.14.14 unset - Unset values and attributes of variables and functions unset [-fv] _n_a_m_e ... 1 Each variable or function specified by _n_a_m_e shall be unset. If -v is specified, _n_a_m_e refers to a variable name and the shell shall 1 unset it and remove it from the environment. Read-only variables cannot 1 be unset. 1 If -f is specified, _n_a_m_e refers to a function and the shell shall unset 1 the function definition. 1 If neither -f nor -v is specified, _n_a_m_e refers to a variable; if a 1 variable by that name does not exist, it is unspecified whether a 1 function by that name, if any, shall be unset. 1 Unsetting a variable or function that was not previously set shall not be considered an error and shall not cause the shell to abort. 1 The unset special built-in shall conform to the utility argument syntax guidelines described in 2.10.2. _E_x_i_t__S_t_a_t_u_s 0 All _n_a_m_es were successfully unset. >0 At least one _n_a_m_e could not be unset. BEGIN_RATIONALE 3.14.14.1 unset Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Note that VARIABLE= is not equivalent to an unset of VARIABLE; in the example, VARIABLE is set to "". Also, the ``variables'' that can be unset should not be misinterpreted to include the special parameters (see 3.5.2). Consideration was given to omitting the -f option in favor of an unfunction utility, but decided to retain existing practice. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 314 3 Shell Command Language Part 2: SHELL AND UTILITIES P1003.2/D11.2 The -v option was introduced because System V historically used one name 1 space for both variables and functions. When unset is used without 1 options, System V historically unset either a function or a variable and 1 there was no confusion about which one was intended. A portable POSIX.2 1 application can use unset without an option to unset a variable, but not 1 a function; the -f option must be used. 1 Examples: Unset the VISUAL variable: unset -v VISUAL 1 Unset the functions foo and bar: unset -f foo bar END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 3.14 Special Built-in Utilities 315 P1003.2/D11.2 Section 4: Execution Environment Utilities The Execution Environment Utilities are the utilities that shall be implemented in all conforming POSIX.2 systems. 4.1 awk - Pattern scanning and processing language 4.1.1 Synopsis awk [-F _E_R_E] [-v _a_s_s_i_g_n_m_e_n_t] ... _p_r_o_g_r_a_m [_a_r_g_u_m_e_n_t ...] awk [-F _E_R_E] -f _p_r_o_g_f_i_l_e ... [-v _a_s_s_i_g_n_m_e_n_t] ... [_a_r_g_u_m_e_n_t ...] 4.1.2 Description The awk utility shall execute programs written in the _a_w_k programming language, which is specialized for textual data manipulation. An awk program is a sequence of patterns and corresponding actions. When input is read that matches a pattern, the action associated with that pattern shall be carried out. Input shall be interpreted as a sequence of records. By default, a record is a line, but this can be changed by using the RS built-in variable. Each record of input shall be matched in turn against each pattern in the program. For each pattern matched, the associated action shall be executed. The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non- characters. This default white space field delimiter can be changed by using the FS built-in variable or the -F _E_R_E. The awk utility shall denote the first field in a record $1, the second $2, and so forth. The symbol $0 shall refer to the entire record; setting any other field shall cause the reevaluation of $0. Assigning to $0 shall reset the values of all other 1 fields and the NF built-in variable. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 317 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.1.3 Options The awk utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -F _E_R_E Define the input field separator to be the extended regular expression _E_R_E, before any input is read (see 4.1.7.4). -f _p_r_o_g_f_i_l_e Specifies the pathname of the file _p_r_o_g_f_i_l_e containing an awk program. If multiple instances of this option are specified, the concatenation of the files specified as _p_r_o_g_f_i_l_e in the order specified shall be the awk program. The awk program can alternatively be specified in the command line as a single argument. -v _a_s_s_i_g_n_m_e_n_t The _a_s_s_i_g_n_m_e_n_t argument shall be in the same form as an _a_s_s_i_g_n_m_e_n_t operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified. 4.1.4 Operands The following operands shall be supported by the implementation: _p_r_o_g_r_a_m If no -f option is specified, the first operand to awk shall be the text of the awk program. The application shall supply the _p_r_o_g_r_a_m operand as a single argument to awk. If the text does not end in a character, awk shall interpret the text as if it did. _a_r_g_u_m_e_n_t Either of the following two types of _a_r_g_u_m_e_n_ts can be intermixed: _f_i_l_e A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. _a_s_s_i_g_n_m_e_n_t An operand that begins with an underscore or alphabetic character from the portable character set (see Table 2-3 in 2.4), followed by a Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 318 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 sequence of underscores, digits, and alphabetics from the portable character set, followed by the = character shall specify a variable assignment rather than a pathname. The characters before the = shall represent the name of an awk variable; if that name is an awk reserved word (see 4.1.7.7) the behavior is undefined. The characters following the equals-sign shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote (") character, as a STRING token (see 4.1.7.7), except that if the last character is an unescaped backslash, it shall be interpreted as a literal backslash rather than as the first character of the sequence ``\"''. The variable shall be assigned the value of that STRING token. If that value is considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the variable shall also be assigned its numeric value. Each such variable assignment shall occur just prior to the processing of the following _f_i_l_e, if any. Thus, an assignment before the first _f_i_l_e argument shall be executed after the BEGIN actions (if any), while an assignment after the last _f_i_l_e argument shall occur before the END actions (if any). If there are no _f_i_l_e arguments, assignments shall be executed before processing the standard input. 4.1.5 External Influences 4.1.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. 4.1.5.2 Input Files Input files to the awk program from any of the following sources: 1 - Any _f_i_l_e operands or their equivalents, achieved by modifying the 1 awk variables ARGV and ARGC 1 - Standard input in the absence of any _f_i_l_e operands 1 - Arguments to the getline function 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 319 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX shall be text files. Whether the variable RS is set to a value other 1 than or not, for these files, the implementation shall support 1 records terminated with the specified separator up to {LINE_MAX} bytes 1 and may support longer records. 1 If -f _p_r_o_g_f_i_l_e is specified, the file(s) named by _p_r_o_g_f_i_l_e shall be text file(s) containing an awk program. 4.1.5.3 Environment Variables The following environment variables shall affect the execution of awk: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files), the behavior of character classes within regular expressions, the identification of characters as letters, and the mapping of upper- and lowercase characters for the toupper and tolower functions. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions and in comparisons of string values. LC_MESSAGES This variable shall determine the language in which messages should be written. LC_NUMERIC This variable shall determine the radix character used when interpreting numeric input, performing conversions between numeric and string values, and formatting numeric output. PATH This variable shall define the search path when looking for commands executed by system(_e_x_p_r), or input and output pipes. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 320 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 In addition, all environment variables shall be visible via the awk variable ENVIRON. 4.1.5.4 Asynchronous Events Default. 4.1.6 External Effects 4.1.6.1 Standard Output The nature of the output files depends on the awk program. 4.1.6.2 Standard Error Used only for diagnostic messages. 4.1.6.3 Output Files The nature of the output files depends on the awk program. 4.1.7 Extended Description 4.1.7.1 Overall Program Structure An awk program is composed of pairs of the form: _p_a_t_t_e_r_n { _a_c_t_i_o_n } Either the pattern or the action (including the enclosing brace characters) can be omitted. A missing pattern shall match any record of input, and a missing action shall be equivalent to an action that writes the matched record of input to standard output. Execution of the awk program shall start by first executing the actions associated with all BEGIN patterns in the order they occur in the program. Then each _f_i_l_e operand (or standard input if no files were specified) shall be processed in turn by reading data from the file until a record separator is seen ( by default), splitting the current 1 record into fields using the current value of FS according to the rules 1 in 4.1.7.4, evaluating each pattern in the program in the order of 1 occurrence, and executing the action associated with each pattern that matches the current record. The action for a matching pattern shall be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 321 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX executed before evaluating subsequent patterns. Last, the actions associated with all END patterns shall be executed in the order they occur in the program. 4.1.7.2 Expressions Table 4-1 - awk Expressions in Decreasing Precedence ___________________________________________________________________________ Semantic Type of Syntax Name Definition Result Assoc ___________________________________________________________________________ (____e__x__p__r_)_______G_r_o_u_p_i_n_g_________________C__S_t_a_n_d_a_r_d__{_7_}_t_y_p_e__o_f____e__x__p__r______n_/_a___ $_e_x_p_r Field reference 4.1.7.2 string n/a ___________________________________________________________________________ ++ _l_v_a_l_u_e Pre-increment C Standard {7}numeric n/a -- _l_v_a_l_u_e Pre-decrement C Standard {7}numeric n/a _l_v_a_l_u_e ++ Post-increment C Standard {7}numeric n/a __l__v__a__l__u__e_-_-______P_o_s_t_-_d_e_c_r_e_m_e_n_t___________C__S_t_a_n_d_a_r_d__{_7_}_n_u_m_e_r_i_c____________n_/_a___ _e_x_p_r ^ _e_x_p_r Exponentiation 4.1.7.2 numeric right ___________________________________________________________________________ ! _e_x_p_r Logical not C Standard {7}numeric n/a + _e_x_p_r Unary plus C Standard {7}numeric n/a -____e__x__p__r________U_n_a_r_y__m_i_n_u_s______________C__S_t_a_n_d_a_r_d__{_7_}_n_u_m_e_r_i_c____________n_/_a___ _e_x_p_r * _e_x_p_r Multiplication C Standard {7}numeric left _e_x_p_r / _e_x_p_r Division C Standard {7}numeric left _|e_x_p_r % _e_x_p_r M|odulus 4|.1.7.2 n|umeric l|eft | _|______________|________________________|______________|__________________|____| _|e_x_p_r + _e_x_p_r A|ddition C| Standard {7}n|umeric l|eft | _|_e__x__p__r_-____e__x__p__r___S|_u_b_t_r_a_c_t_i_o_n______________C|__S_t_a_n_d_a_r_d__{_7_}_n|_u_m_e_r_i_c____________l|_e_f_t__| _|e_x_p_r _e_x_p_r S|tring concatenation 4|.1.7.2 s|tring l|eft | _|______________|________________________|______________|__________________|____| _|e_x_p_r < _e_x_p_r L|ess than 4|.1.7.2 n|umeric n|one | _|e_x_p_r <= _e_x_p_r L|ess than or equal to 4|.1.7.2 n|umeric n|one | _|e_x_p_r != _e_x_p_r N|ot equal to 4|.1.7.2 n|umeric n|one | _|e_x_p_r == _e_x_p_r E|qual to 4|.1.7.2 n|umeric n|one | _|e_x_p_r > _e_x_p_r G|reater than 4|.1.7.2 n|umeric n|one | _|_e__x__p__r_>_=____e__x__p__r__G|_r_e_a_t_e_r__t_h_a_n__o_r__e_q_u_a_l__t_o_4|_._1_._7_._2________n|_u_m_e_r_i_c____________n|_o_n_e__| _|e_x_p_r _e_x_p_r E|RE match 4|.1.7.4 n|umeric n|one | _|e_x_p_r ~! _e_x_p_r E|RE nonmatch 4|.1.7.4 n|umeric n|one | _|_____~_________|________________________|______________|__________________|____| _|e_x_p_r in array A|rray membership 4|.1.7.2 n|umeric l|eft | (| _i_n_d_e_x ) in M|ultidimension array 4|.1.7.2 n|umeric l|eft | _|_____a__r__r__a__y______|___m_e_m_b_e_r_s_h_i_p____________|______________|__________________|____| _|e_x_p_r && _e_x_p_r L|ogical AND C| Standard {7}n|umeric l|eft 1| _|______________|________________________|______________|__________________|____1| _|_e__x__p__r_|_|____e__x__p__r__L|_o_g_i_c_a_l__O_R_______________C|__S_t_a_n_d_a_r_d__{_7_}_n|_u_m_e_r_i_c____________l|_e_f_t__1|1 _|e_x_p_r_1 ? _e_x_p_r_2 C|onditional expression C| Standard {7}t|ype of selected r|ight1| | | | | | | | | | | | | | C|opyright c 1991 IEEE. A|ll rights rese|rved. | | | This is an| unapproved IEEE Standar|ds Draft, subj|ect to change. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3|22 | | 4 Execution E|nvironment Utiliti|es | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P|art 2: SHELL A|ND UTILITIES | | P1003.2/D11|.2 | | | | | | | | : _e_x_p_r_3 | | | _e_x_p_r_2 or _e_x_p_r_3| | _|______________|________________________|______________|__________________|____| _|l_v_a_l_u_e ^= _e_x_p_rE|xponentiation 4|.1.7.2 n|umeric r|ight| | a|ssignment | | | | _|l_v_a_l_u_e %= _e_x_p_rM|odulus assignment 4|.1.7.2 n|umeric r|ight| _|l_v_a_l_u_e *= _e_x_p_rM|ultiplication C| Standard {7}n|umeric r|ight| | a|ssignment | | | | _|l_v_a_l_u_e /= _e_x_p_rD|ivision assignment C| Standard {7}n|umeric r|ight| _|l_v_a_l_u_e += _e_x_p_rA|ddition assignment C| Standard {7}n|umeric r|ight| _|l_v_a_l_u_e -= _e_x_p_rS|ubtraction assignment C| Standard {7}n|umeric r|ight| _|_l__v__a__l__u__e_=____e__x__p__r_A|_s_s_i_g_n_m_e_n_t_______________C|__S_t_a_n_d_a_r_d__{_7_}_t|_y_p_e__o_f____e__x__p__r______r|_i_g_h_t_| Expressions describe computations used in _p_a_t_t_e_r_n_s and _a_c_t_i_o_n_s. In Table 4-1, valid expression operations are given in groups from highest precedence first to lowest precedence last, with equal-precedence operators grouped between horizontal lines. In expression evaluation, higher precedence operators shall be evaluated before lower precedence operators. In this table _e_x_p_r, _e_x_p_r_1, _e_x_p_r_2, and _e_x_p_r_3 represent any expression, while _l_v_a_l_u_e represents any entity that can be assigned to (i.e., on the left side of an assignment operator). The precise syntax of expressions is given in the grammar in 4.1.7.7. Each expression shall have either a string value, a numeric value, or both. Except as stated for specific contexts, the value of an expression shall be implicitly converted to the type needed for the context in which it is used. A string value shall be converted to a numeric value by the equivalent of the following calls to functions defined by the C Standard {7}: setlocale(LC_NUMERIC, ""); _n_u_m_e_r_i_c__v_a_l_u_e = _a_t_o_f(_s_t_r_i_n_g__v_a_l_u_e); A numeric value that is exactly equal to the value of an integer (see 2.9.2.1) shall be converted to a string by the equivalent of a call to the sprintf function (see 4.1.7.6.2) with the string "%d" as the _f_m_t argument and the numeric value being converted as the first and only _e_x_p_r argument. Any other numeric value shall be converted to a string by the equivalent of a call to the sprintf function with the value of the variable CONVFMT as the _f_m_t argument and the numeric value being converted as the first and only _e_x_p_r argument. The result of the 1 conversion is unspecified if the value of CONVFMT is not a floating-point 1 format specification. This standard specifies no explicit conversions 1 between numbers and strings. An application can force an expression to be treated as a number by adding zero to it, or can force it to be treated as a string by concatenating the null string ("") to it. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 323 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX A string value shall be considered to be a _n_u_m_e_r_i_c _s_t_r_i_n_g in the following case: (1) Any leading and trailing _s shall be ignored. (2) If the first unignored character is a + or -, it shall be ignored. (3) If the remaining unignored characters would be lexically recognized as a NUMBER token (as described by the lexical conventions in 4.1.7.7), the string shall be considered a _n_u_m_e_r_i_c _s_t_r_i_n_g. If a - character is ignored in the above steps, the numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g shall be the negation of the numeric value of the recognized NUMBER token. Otherwise the numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g shall be the numeric value of the recognized NUMBER token. Whether or not a string is a _n_u_m_e_r_i_c _s_t_r_i_n_g shall be relevant only in contexts where that term is used in this clause. When an expression is used in a Boolean context (the first subexpression of a conditional expression, an expression operated on by logical NOT, logical AND, or logical OR, the second expression of a for statement, the expression of an if statement, or the expression of a while statement), if it has a numeric value, a value of zero shall be treated as false and any other value shall be treated as true. Otherwise, a string value of the null string shall be treated as false and any other value shall be treated as true. All arithmetic shall follow the semantics of floating point arithmetic as specified by the C Standard {7}; see 2.9.2. The value of the expression _e_x_p_r_1 ^ _e_x_p_r_2 shall be equivalent to the value returned by the C Standard {7} function call _p_o_w(_e_x_p_r_1, _e_x_p_r_2) The expression _l_v_a_l_u_e ^= _e_x_p_r shall be equivalent to the C Standard {7} expression _l_v_a_l_u_e = _p_o_w(_l_v_a_l_u_e, _e_x_p_r) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 324 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 except that _l_v_a_l_u_e shall be evaluated only once. The value of the expression _e_x_p_r_1 % _e_x_p_r_2 shall be equivalent to the value returned by the C Standard {7} function call _f_m_o_d(_e_x_p_r_1, _e_x_p_r_2) The expression _l_v_a_l_u_e %= _e_x_p_r shall be equivalent to the C Standard {7} expression _l_v_a_l_u_e = _f_m_o_d(_l_v_a_l_u_e, _e_x_p_r) except that _l_v_a_l_u_e shall be evaluated only once. Variables and fields shall be set by the assignment statement: _l_v_a_l_u_e = _e_x_p_r_e_s_s_i_o_n and the type of _e_x_p_r_e_s_s_i_o_n shall determine the resulting variable type. The assignment includes the arithmetic assignments (+=, -=, *=, /=, %=, ^=, ++, --) all of which produce a numeric result. The left-hand side of an assignment and the target of increment and decrement operators can be one of a variable, an array with index, or a field selector. The awk language shall supply arrays that are used for storing numbers or strings. Arrays need not be declared. They shall initially be empty, and their sizes shall change dynamically. The subscripts, or element identifiers, are strings, providing a type of associative array capability. An array name followed by a subscript within square brackets can be used as an _l_v_a_l_u_e and thus as an expression, as described in the grammar (see 4.1.7.7). Unsubscripted array names can be used in only the following contexts: - A parameter in a function definition or function call. - The NAME token following any use of the keyword in as specified in the grammar (see 4.1.7.7). If the name used in this context is not an array name, the behavior is undefined. A valid array _i_n_d_e_x shall consist of one or more comma-separated expressions, similar to the way in which multidimensional arrays are indexed in some programming languages. Because awk arrays are really one dimensional, such a comma-separated list shall be converted to a single Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 325 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX string by concatenating the string values of the separate expressions, each separated from the other by the value of the SUBSEP variable. Thus, the following two index operations shall be equivalent: _v_a_r[_e_x_p_r_1, _e_x_p_r_2, ..., _e_x_p_r_n] _v_a_r[_e_x_p_r_1 _S_U_B_S_E_P _e_x_p_r_2 _S_U_B_S_E_P ... SUBSEP _e_x_p_r_n] A multidimensioned _i_n_d_e_x used with the in operator shall be parenthesized. The in operator, which tests for the existence of a particular array element, shall not cause that element to exist. Any other reference to a nonexistent array element shall automatically create it. Comparisons (with the <, <=, !=, ==, >, and >= operators) shall be made numerically if both operands are numeric or if one is numeric and the other has a string value that is a numeric string. Otherwise, operands 1 shall be converted to strings as required and a string comparison shall 1 be made using the locale-specific collation sequence. The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false. 4.1.7.3 Variables and Special Variables Variables can be used in an awk program by referencing them. With the exception of function parameters (see 4.1.7.6.2), they are not explicitly declared. Uninitialized scalar variables and array elements have both a numeric value of zero and a string value of the empty string. Field variables shall be designated by a $ followed by a number or numerical expression. The effect of the field number _e_x_p_r_e_s_s_i_o_n evaluating to anything other than a nonnegative integer is unspecified; uninitialized variables or string values need not be converted to numeric values in this context. New field variables can be created by assigning a value to them. References to nonexistent fields (i.e., fields after $NF), shall produce the null string. However, assigning to a nonexistent field [e.g., $(NF+_2) = 5] shall increase the value of NF, create any intervening fields with the null string as their values, and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS. Each field variable shall have a string value when created. If the string, with any occurrence of the decimal-point character from the current locale changed to a , would be considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the field variable shall also have the numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g. The implementation shall support the following other special variables that are set by awk: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 326 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 ARGC The number of elements in the ARGV array. ARGV An array of command line arguments, excluding options and the _p_r_o_g_r_a_m argument, numbered from zero to ARGC-_1. The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file ends, awk shall treat the next nonnull element of ARGV, up through the current value of ARGC-_1, as the name of the next input file. Thus, setting an element of ARGV to null means that it shall not be treated as an input file. The name '-' shall indicate the standard input. If an argument matches the format of an _a_s_s_i_g_n_m_e_n_t operand, this argument shall be treated as an assignment rather than a _f_i_l_e argument. CONVFMT The printf format for converting numbers to strings (except for output statements, where OFMT is used); "%.6g" by default. ENVIRON The variable ENVIRON is an array representing the value of the environment, as described in POSIX.1 {8} 2.7. The indices of the array shall be strings consisting of the names of the environment variables, and the value of each array element shall be a string consisting of the value of that variable. If the value of an environment variable is considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the array element shall also have its numeric value. In all cases where the behavior of awk is affected by environment variables [including the environment of any command(s) that awk executes via the system function or via pipeline redirections with the print statement, the printf statement, or the getline function], the environment used shall be the environment at the time awk began executing; it is implementation defined whether any 1 modification of ENVIRON affects this environment. 1 FILENAME A pathname of the current input file. Inside a BEGIN action the value is undefined. Inside an END action the value is the name of the last input file processed. FNR The ordinal number of the current record in the current file. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed in the last file processed. FS Input field separator regular expression; by default. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 327 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX NF The number of fields in the current record. Inside a BEGIN action, the use of NF is undefined unless a getline function without a _v_a_r argument is executed previously. Inside an END action, NF shall retain the value it had for the last record read, unless a subsequent, redirected, getline function without a _v_a_r argument is performed prior to entering the END action. NR The ordinal number of the current record from the start of input. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed. OFMT The printf format for converting numbers to strings in output statements (see 4.1.7.6.1); "%.6g" by default. The 2 result of the conversion is unspecified if the value of 2 OFMT is not a floating-point format specification. 2 OFS The print statement output field separation; by default. ORS The print statement output record separator; by default. RLENGTH The length of the string matched by the match function. RS The first character of the string value of RS is the input record separator; by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences of one or more blank lines, leading or trailing blank lines do not result in empty records at the beginning or end of the input, and is always a field separator, no matter what the value of FS is. RSTART The starting position of the string matched by the match function, numbering from 1. This is always equivalent to the return value of the match function. SUBSEP The subscript separator string for multidimensional arrays; the default value is implementation defined. 4.1.7.4 Regular Expressions The awk utility shall make use of the extended regular expression notation (see 2.8.4) except that it shall allow the use of C-language conventions for escaping special characters within the EREs, as specified in Table 2-15 and Table 4-2; these escape sequences shall be recognized 1 both inside and outside bracket expressions. Note that records need not 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 328 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 be separated by s and string constants can contain s, 1 so even the \n sequence is valid in awk EREs. Using a slash character 1 within the regular expression requires the escaping shown in Table 4-2. 1 A regular expression can be matched against a specific field or string by using one of the two regular expression matching operators, and ! . These operators shall interpret their right-hand operand as ~a regul~ar expression and their left-hand operand as a string. If the regular expression matches the string, the expression shall evaluate to a value of 1, and the ! expression shall e~valuate to a value of 0. (The regular expression matc~hing operation is as defined in 2.8.1.2, where a match occurs on any part of the string unless the regular expression is limited with the circumflex or dollar-sign special characters.) If the regular expression does not match the string, the expression shall evaluate to a value of 0, and the ! expression shall ~evaluate to a value of 1. If the right-hand operand ~is any expression other than the lexical token ERE, the string value of the expression shall be interpreted as an extended regular expression, including the escape conventions described above. Note that these same escape conventions also shall be applied in the determining the value of a string literal (the lexical token STRING), and thus shall be applied a second time when a string literal is used in this context. When an ERE token appears as an expression in any context other than as the right-hand of the or ! operator or as one of the built-in function arguments described be~low, t~he value of the resulting expression shall be the equivalent of $0 /_e_r_e/ ~ The _E_R_E argument to the gsub, match, sub functions, and the _f_s argument to the split function (see 4.1.7.6.2) shall be interpreted as extended regular expressions. These can be either ERE tokens or arbitrary expressions, and shall be interpreted in the same manner as the right- hand side of the or ! operator. ~ ~ An extended regular expression can be used to separate fields by using the -F _E_R_E option or by assigning a string containing the expression to the built-in variable FS. The default value of the FS variable shall be a single character. The following describes FS behavior: (1) If FS is a single character: (a) If FS is , skip leading and trailing _s; fields shall be delimited by sets of one or more _s. (b) Otherwise, if FS is any other character _c, fields shall be delimited by each single occurrence of _c. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 329 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (2) Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields. Except in the gsub, match, split, and sub built-in functions, regular expression matching shall be based on input records; i.e., record separator characters (the first character of the value of the variable RS, by default) cannot be embedded in the expression, and no expression shall match the record separator character. If the record separator is not , characters embedded in the expression can be matched. In those four built-in functions, regular expression matching shall be based on text strings; i.e., any character (including and the record separator) can be embedded in the pattern and an appropriate pattern shall match any character. However, in all awk regular expression matching, the use of one or more NUL characters in the pattern, input record, or text string produces undefined results. 4.1.7.5 Patterns A _p_a_t_t_e_r_n is any valid _e_x_p_r_e_s_s_i_o_n, a range specified by two expressions separated by comma, or one of the two special patterns BEGIN or END. 4.1.7.5.1 Special Patterns The awk utility shall recognize two special patterns, BEGIN and END. Each BEGIN pattern shall be matched once and its associated action executed before the first record of input is read [except possibly by use of the getline function (see 4.1.7.6.2) in a prior BEGIN action] and before command line assignment is done. Each END pattern shall be matched once and its associated action executed after the last record of input has been read. These two patterns shall have associated actions. BEGIN and END shall not combine with other patterns. Multiple BEGIN and END patterns shall be allowed. The actions associated with the BEGIN patterns shall be executed in the order specified in the program, as are the END actions. An END pattern can precede a BEGIN pattern in a program. If an awk program consists of only actions with the pattern BEGIN, and the BEGIN action contains no getline function, awk shall exit without reading its input when the last statement in the last BEGIN action is executed. If an awk program consists of only actions with the pattern END or only actions with the patterns BEGIN and END, the input shall be read before the statements in the END action(s) are executed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 330 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.1.7.5.2 Expression Patterns An expression pattern shall be evaluated as if it were an expression in a 1 Boolean context. If the result is true, the pattern shall be considered 1 to match, and the associated action (if any) shall be executed. If the 1 result is false, the action shall not be executed. 1 4.1.7.5.3 Pattern Ranges A pattern range consists of two expressions separated by a comma; in this case, the action shall be performed for all records between a match of the first expression and the following match of the second expression, inclusive. At this point, the pattern range can be repeated starting at input records subsequent to the end of the matched range. 4.1.7.6 Actions An action is a sequence of statements as shown in the grammar in 4.1.7.7. Any single statement can be replaced by a statement list enclosed in braces. The statements in a statement list shall be separated by s or semicolons, and shall be executed sequentially in the order that they appear. The _e_x_p_r_e_s_s_i_o_n acting as the conditional in an if statement shall be evaluated and if it is nonzero or nonnull, the following _s_t_a_t_e_m_e_n_t shall be executed; otherwise, if else is present, the statement following the else shall be executed. The if, while, do ... while, for, break, and continue statements are based on the C Standard {7} (see 2.9.2), except that the Boolean expressions shall be treated as described in 4.1.7.2, and except in the case of for (_v_a_r_i_a_b_l_e _i_n _a_r_r_a_y) which shall iterate, assigning each _i_n_d_e_x of _a_r_r_a_y to _v_a_r_i_a_b_l_e in an unspecified order. The results of adding new elements to _a_r_r_a_y within such a for loop are undefined. If a break or continue statement occurs outside of a loop, the behavior is undefined. The delete statement shall remove an individual array element. Thus, the following code shall delete an entire array: for (index in array) delete array[index] The next statement shall cause all further processing of the current input record to be abandoned. The behavior is undefined if a next statement appears or is invoked in a BEGIN or END action. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 331 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The exit statement shall invoke all END actions in the order in which they occur in the program source and then terminate the program without reading further input. An exit statement inside an END action shall terminate the program without further execution of END actions. If an expression is specified in an exit statement, its numeric value shall be the exit status of awk, unless subsequent errors are encountered or a subsequent exit statement with an expression is executed. 4.1.7.6.1 Output Statements Both print and printf statements shall write to standard output by default. The output shall be written to the location specified by _o_u_t_p_u_t__r_e_d_i_r_e_c_t_i_o_n if one is supplied, as follows: > _e_x_p_r_e_s_s_i_o_n >> _e_x_p_r_e_s_s_i_o_n | _e_x_p_r_e_s_s_i_o_n In all cases, the _e_x_p_r_e_s_s_i_o_n shall be evaluated to produce a string that is used as a full pathname to write into (for > or >>) or as a command to be executed (for |). Using the first two forms, if the file of that name is not currently open, it shall be opened, creating it if necessary, and using the first form, truncating the file. The output then shall be appended to the file. As long as the file remains open, subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value simply shall append output to the file. The file remains open until the close function (see 4.1.7.6.2). is called with an expression that evaluates to the same string value. The third form shall write output onto a stream piped to the input of a command. The stream shall be created if no stream is currently open with the value of _e_x_p_r_e_s_s_i_o_n as its command name. The stream created shall be equivalent to one created by a call to the _p_o_p_e_n() function (see B.3.2) with the value of _e_x_p_r_e_s_s_i_o_n as the _c_o_m_m_a_n_d argument and a value of "w" as the _m_o_d_e argument. As long as the stream remains open, subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value shall write output to the existing stream. The stream shall remain open until the close function (see 4.1.7.6.2) is called with an expression that evaluates to the same string value. At that time, the stream shall be closed as if by a call to the _p_c_l_o_s_e() function (see B.3.2). As described in detail by the grammar in 4.1.7.7, these output statements shall take a comma-separated list of _e_x_p_r_e_s_s_i_o_ns referred in the grammar by the nonterminal symbols expr_list, print_expr_list, or print_expr_list_opt. This list is referred to here as the _e_x_p_r_e_s_s_i_o_n _l_i_s_t, and each member is referred to as an _e_x_p_r_e_s_s_i_o_n _a_r_g_u_m_e_n_t. The print statement shall write the value of each expression argument onto the indicated output stream separated by the current output field Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 332 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 separator (see variable OFS above), and terminated by the output record separator (see variable ORS above). All expression arguments shall be taken as strings, being converted if necessary; this conversion shall be 1 as described in 4.1.7.2, with the exception that the printf format in 1 OFMT shall be used instead of the value in CONVFMT. An empty expression 1 list shall stand for the whole input record ($0). The printf statement shall produce output based on a notation similar to the File Format Notation used to describe file formats in this standard (see 2.12). Output shall be produced as specified with the first expression argument as the string <_f_o_r_m_a_t> and subsequent expression arguments as the strings <_a_r_g_1> through <_a_r_g_n>, with the following exceptions: (1) The _f_o_r_m_a_t shall be an actual character string rather than a graphical representation. Therefore, it cannot contain empty character positions. The character in the _f_o_r_m_a_t string, in any context other than a _f_l_a_g of a conversion specification, shall be treated as an ordinary character that is copied to the output. (2) If the character set contains a W character and that character appears in the _f_o_r_m_a_t string, it shall be treated as an ordinary character that is copied to the output. (3) The _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s beginning with a backslash character shall be treated as sequences of ordinary characters that are copied to the output. (Note that these same sequences shall be interpreted lexically by awk when they appear in literal strings, but they shall not be treated specially by the printf statement). (4) A _f_i_e_l_d _w_i_d_t_h or _p_r_e_c_i_s_i_o_n can be specified as the * character instead of a digit string. In this case the next argument from the expression list shall be fetched and its numeric value taken as the field width or precision. (5) The implementation shall not precede or follow output from the d or u conversion specifications with _s not specified by the _f_o_r_m_a_t string. (6) The implementation shall not precede output from the o conversion specification with leading zeroes not specified by the _f_o_r_m_a_t string. (7) For the c conversion specification: if the argument has a numeric value, the character whose encoding is that value shall be output. If the value is zero or is not the encoding of any character in the character set, the behavior is undefined. If Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 333 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX the argument does not have a numeric value, the first character of the string value shall be output; if the string does not contain any characters the behavior is undefined. (8) For each conversion specification that consumes an argument, the next expression argument shall be evaluated. With the exception of the c conversion, the value shall be converted (according to the rules specified in 4.1.7.2) to the appropriate type for the conversion specification. (9) If there are insufficient expression arguments to satisfy all the conversion specifications in the _f_o_r_m_a_t string, the behavior is undefined. (10) If any character sequence in the _f_o_r_m_a_t string begins with a % character, but does not form a valid conversion specification, the behavior is unspecified. Both print and printf can output at least {LINE_MAX} bytes. 4.1.7.6.2 Functions The awk language has a variety of built-in functions: arithmetic, string, input/output, and general. 4.1.7.6.2.1 _A_r_i_t_h_m_e_t_i_c__F_u_n_c_t_i_o_n_s The arithmetic functions, except for int, shall be based on the C Standard {7}; see 2.9.2. The behavior is undefined in cases where the C Standard {7} specifies that an error be returned or that the behavior is undefined. atan2(_y,_x) Return arctangent of _y/_x. cos(_x) _R_e_t_u_r_n _c_o_s_i_n_e _o_f _x, _w_h_e_r_e _x _i_s _i_n _r_a_d_i_a_n_s. _s_i_n(_x) _R_e_t_u_r_n _s_i_n_e _o_f _x, _w_h_e_r_e _x _i_s _i_n _r_a_d_i_a_n_s. _e_x_p(_x) _R_e_t_u_r_n _t_h_e _e_x_p_o_n_e_n_t_i_a_l _f_u_n_c_t_i_o_n _o_f _x. _l_o_g(_x) _R_e_t_u_r_n _t_h_e _n_a_t_u_r_a_l _l_o_g_a_r_i_t_h_m _o_f _x. _s_q_r_t(_x) _R_e_t_u_r_n _t_h_e _s_q_u_a_r_e _r_o_o_t _o_f _x. _i_n_t(_x) _T_r_u_n_c_a_t_e _i_t_s _a_r_g_u_m_e_n_t _t_o _a_n _i_n_t_e_g_e_r. _I_t _s_h_a_l_l _b_e _t_r_u_n_c_a_t_e_d _t_o_w_a_r_d _0 _w_h_e_n _x > 0. rand() _R_e_t_u_r_n _a _r_a_n_d_o_m _n_u_m_b_e_r _n, _s_u_c_h _t_h_a_t _0 _< _n < _1. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 334 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _s_r_a_n_d([expr]) Set the seed value for rand to _e_x_p_r or use the time of day if _e_x_p_r is omitted. The previous seed value shall be returned. 4.1.7.6.2.2 _S_t_r_i_n_g__F_u_n_c_t_i_o_n_s The string functions are: gsub(_e_r_e, _r_e_p_l[,_i_n]) Behave like sub (see below), except that it shall replace all occurrences of the regular expression (like the ed utility global substitute) in $0 or in the _i_n argument, when specified. index(_s, _t) Return the position, in characters, numbering from 1, in string _s where string _t first occurs, or zero if it does not occur at all. length([_s]) Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument. match(_s, _e_r_e) Return the position, in characters, numbering from 1, in string _s where the extended regular expression _E_R_E occurs, or zero if it does not occur at all. RSTART shall be set to the starting position (which is the same as the returned value), zero if no match is found; RLENGTH shall be set to the length of the matched string, -1 if no match is found. split(_s, _a[,_f_s]) Split the string _s into array elements _a[1], _a[2], ... , _a[_n], and returns _n. The separation shall be done with the extended regular expression _f_s or with the field separator FS if _f_s is not given. Each array element shall have a string value when created. If the string assigned to any array element, with any occurrence of the decimal-point character from the current locale changed to a , would be considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the array element shall also have the numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g. The effect of a null string as the value of _f_s is unspecified. sprintf(_f_m_t, _e_x_p_r, _e_x_p_r, ...) Format the expressions according to the printf format given by _f_m_t and return the resulting string. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 335 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX sub(_e_r_e, _r_e_p_l[,_i_n]) Substitute the string _r_e_p_l in place of the first instance of the extended regular expression _E_R_E in string _i_n and return the number of substitutions. An ampersand (&) appearing in the string _r_e_p_l shall be replaced by the string from _i_n that matches the regular expression. An ampersand preceded by a backslash within _r_e_p_l shall be interpreted as a literal ampersand character. If _i_n is specified and it is not an _l_v_a_l_u_e (see 4.1.7.2), the behavior is undefined. If _i_n is omitted, awk shall substitute in the current record ($0). substr(_s, _m[,_n]) Return the at most _n-character substring of _s that begins at position _m, numbering from 1. If _n is missing, the length of the substring shall be limited by the length of the string _s. tolower(_s) Return a string based on the string _s. Each character in _s that is an uppercase letter specified to have a tolower mapping by the LC_CTYPE category of the current locale shall be replaced in the returned string by the lowercase letter specified by the mapping. Other characters in _s shall be unchanged in the returned string. toupper(_s) Return a string based on the string _s. Each character in _s that is a lowercase letter specified to have a toupper mapping by the LC_CTYPE category of the current locale shall be replaced in the returned string by the uppercase letter specified by the mapping. Other characters in _s shall be unchanged in the returned string. All of the preceding functions that take _E_R_E as a parameter expect a pattern or a string valued expression that is a regular expression as defined in 4.1.7.4. 4.1.7.6.2.3 _I_n_p_u_t_/_O_u_t_p_u_t__a_n_d__G_e_n_e_r_a_l__F_u_n_c_t_i_o_n_s The input/output and general functions are: close(_e_x_p_r_e_s_s_i_o_n) Close the file or pipe opened by a print or printf statement or a call to getline with the same string-valued _e_x_p_r_e_s_s_i_o_n. The limit on the number of open _e_x_p_r_e_s_s_i_o_n arguments is implementation defined. If the close was successful, the function shall return zero; otherwise, it shall return Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 336 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 nonzero. _e_x_p_r_e_s_s_i_o_n | _g_e_t_l_i_n_e [_v_a_r] Read a record of input from a stream piped from the output of a command. The stream shall be created if no stream is currently open with the value of _e_x_p_r_e_s_s_i_o_n as its command name. The stream created shall be equivalent to one created by a call to the _p_o_p_e_n() function with the value of _e_x_p_r_e_s_s_i_o_n as the _c_o_m_m_a_n_d argument and a value of "r" as the _m_o_d_e argument. As long as the stream remains open, subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value shall read subsequent records from the file. The stream shall remain open until the close function is called with an expression that evaluates to the same string value. At that time, the stream shall be closed as if by a call to the _p_c_l_o_s_e() function. If _v_a_r is missing, $0 and NF shall be set; otherwise, _v_a_r shall be set. getline Set $0 to the next input record from the current input file. This form of getline shall set the NF, NR, and FNR variables. getline _v_a_r Set variable _v_a_r to the next input record from the current input file. This form of getline shall set the FNR and NR variables. getline [_v_a_r] < _e_x_p_r_e_s_s_i_o_n Read the next record of input from a named file. The _e_x_p_r_e_s_s_i_o_n shall be evaluated to produce a string that is used as a full pathname. If the file of that name is not currently open, it shall be opened. As long as the stream remains open, subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value shall read subsequent records from the file. The file shall remain open until the close function is called with an expression that evaluates to the same string value. If _v_a_r is missing, $0 and NF shall be set; otherwise, _v_a_r shall be set. system(_e_x_p_r_e_s_s_i_o_n) Execute the command given by _e_x_p_r_e_s_s_i_o_n in a manner equivalent to the _s_y_s_t_e_m() function [see B.3.1] and return the exit status of the command. All forms of getline shall return 1 for successful input, zero for end of file, and -1 for an error. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 337 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.1.7.6.2.4 _U_s_e_r_-_D_e_f_i_n_e_d__F_u_n_c_t_i_o_n_s The awk language also shall provide user-defined functions. Such functions can be defined as: _f_u_n_c_t_i_o_n _n_a_m_e(_a_r_g_s,...) { _s_t_a_t_e_m_e_n_t_s } A function can be referred to anywhere in an awk program; in particular, its use can precede its definition. The scope of a function shall be global. Function arguments can be either scalars or arrays; the behavior is undefined if an array name is passed as an argument that the function uses as a scalar, or if a scalar expression is passed as an argument that the function uses as an array. Function arguments shall be passed by value if scalar and by reference if array name. Argument names shall be local to the function; all other variable names shall be global. The same name shall not be used as both an argument name and as the name of a function or a special awk variable. The same name shall not be used both as a variable name with global scope and as the name of a function. The same name shall not be used within the same scope both as a scalar variable and as an array. The number of parameters in the function definition need not match the number of parameters in the function call. Excess formal parameters can be used as local variables. If fewer arguments are supplied in a 1 function call than are in the function definition, the extra parameters 1 that are used in the function body as scalars shall be initialized with a 1 string value of the null string and a numeric value of zero, and the 1 extra parameters that are used in the function body as arrays shall be 1 initialized as empty arrays. If more arguments are supplied in a 1 function call than are in the function definition, the behavior is undefined. When invoking a function, no white space can be placed between the function name and the opening parenthesis. The implementation shall 1 permit function calls to be nested, and for recursive calls to be made 1 upon functions. Upon return from any nested or recursive function call, the values of all of the calling function's parameters shall be unchanged, except for array parameters passed by reference. The return statement can be used to return a value. If a return statement appears outside of a function definition, the behavior is undefined. In the function definition, s shall be optional before the opening brace and after the closing brace. Function definitions can appear anywhere in the program where a _p_a_t_t_e_r_n-_a_c_t_i_o_n pair is allowed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 338 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _4._1._7._7 awk _G_r_a_m_m_a_r The grammar in this subclause and the lexical conventions in the following subclause shall together describe the syntax for awk programs. The general conventions for this style of grammar are described in 2.1.2. A valid program can be represented as the nonterminal symbol _p_r_o_g_r_a_m in the grammar. Any discrepancies found between this grammar and other descriptions in this clause shall be resolved in favor of this grammar. %token NAME NUMBER STRING ERE NEWLINE %token FUNC_NAME /* name followed by '(' without white space */ /* Keywords */ %token Begin End /* 'BEGIN' 'END' */ %token Break Continue Delete Do Else /* 'break' 'continue' 'delete' 'do' 'else' */ %token Exit For Function If In /* 'exit' 'for' 'function' 'if' 'in' */ %token Next Print Printf Return While /* 'next' 'print' 'printf' 'return' 'while' */ /* Reserved function names */ %token BUILTIN_FUNC_NAME /* one token for the following: * atan2 cos sin exp log sqrt int rand srand * gsub index length match split sprintf sub substr * tolower toupper close system */ %token GETLINE /* Syntactically different from other built-ins */ /* Two-character tokens */ %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN /* '+=' '-=' '*=' '/=' '%=' '^=' */ %token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND /* '||' '&&' '! ' '==' '<=' '>=' '!=' '++' '--' '>>' */ ~ /* One-character tokens */ %token '{' '}' '(' ')' '[' ']' ',' ';' %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' ' ' '$' '=' ~ %start program %% program: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 339 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX item_list | actionless_item_list ; item_list: newline_opt | actionless_item_list item terminator | item_list item terminator | item_list action terminator ; actionless_item_list: item_list pattern terminator | actionless_item_list pattern terminator ; item: pattern action | Function NAME '(' param_list_opt ')' newline_opt action | Function FUNC_NAME '(' param_list_opt ')' newline_opt action ; param_list_opt: /* empty */ | param_list ; param_list: NAME | param_list ',' NAME ; pattern: Begin | End | expr | expr ',' newline_opt expr ; action: '{' newline_opt '}' | '{' newline_opt terminated_statement_list '}' | '{' newline_opt unterminated_statement_list '}' ; terminator: ';' | NEWLINE | terminator NEWLINE ';' 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 340 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 ; terminated_statement_list: terminated_statement | terminated_statement_list terminated_statement ; unterminated_statement_list: unterminated_statement | terminated_statement_list unterminated_statement ; terminated_statement: action newline_opt | If '(' expr ')' newline_opt terminated_statement Else newline_opt terminated_statement | While '(' expr ')' newline_opt terminated_statement | For '(' simple_statement_opt ';' expr_opt ';' simple_statement_opt ')' newline_opt terminated_statement | For '(' NAME In NAME ')' newline_opt terminated_statement | ';' newline_opt | terminatable_statement NEWLINE newline_opt | terminatable_statement ';' newline_opt ; unterminated_statement: terminatable_statement | If '(' expr ')' newline_opt unterminated_statement | If '(' expr ')' newline_opt terminated_statement Else newline_opt unterminated_statement | While '(' expr ')' newline_opt unterminated_statement | For '(' simple_statement_opt ';' expr_opt ';' simple_statement_opt ')' newline_opt unterminated_statement | For '(' NAME In NAME ')' newline_opt unterminated_statement ; terminatable_statement: simple_statement | Break | Continue | Next | Exit expr_opt | Return expr_opt | Do newline_opt terminated_statement While '(' expr ')' ; simple_statement_opt: /* empty */ | simple_statement Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 341 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ; simple_statement: Delete NAME '[' expr_list ']' | expr | print_statement ; print_statement: simple_print_statement | simple_print_statement output_redirection ; simple_print_statement: Print print_expr_list_opt | Print '(' multiple_expr_list ')' | Printf print_expr_list | Printf '(' multiple_expr_list ')' ; output_redirection: '>' expr | APPEND expr | '|' expr ; expr_list_opt: /* empty */ | expr_list ; expr_list: expr | multiple_expr_list ; multiple_expr_list: expr ',' newline_opt expr | multiple_expr_list ',' newline_opt expr ; expr_opt: /* empty */ | expr ; expr: unary_expr | non_unary_expr Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 342 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 ; unary_expr: '+' expr | '-' expr | unary_expr '^' expr | unary_expr '*' expr | unary_expr '/' expr | unary_expr '%' expr | unary_expr '+' expr | unary_expr '-' expr | unary_expr non_unary_expr | unary_expr '<' expr | unary_expr LE expr | unary_expr NE expr | unary_expr EQ expr | unary_expr '>' expr | unary_expr GE expr | unary_expr ' ' expr | unary_expr N~O_MATCH expr | unary_expr In NAME | unary_expr AND newline_opt expr | unary_expr OR newline_opt expr | unary_expr '?' expr ':' expr | unary_input_function ; non_unary_expr: '(' expr ')' | '!' expr | non_unary_expr '^' expr | non_unary_expr '*' expr | non_unary_expr '/' expr | non_unary_expr '%' expr | non_unary_expr '+' expr | non_unary_expr '-' expr | non_unary_expr non_unary_expr | non_unary_expr '<' expr | non_unary_expr LE expr | non_unary_expr NE expr | non_unary_expr EQ expr | non_unary_expr '>' expr | non_unary_expr GE expr | non_unary_expr ' ' expr | non_unary_expr N~O_MATCH expr | non_unary_expr In NAME | '(' multiple_expr_list ')' In NAME | non_unary_expr AND newline_opt expr | non_unary_expr OR newline_opt expr Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 343 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX | non_unary_expr '?' expr ':' expr | NUMBER | STRING | lvalue | ERE | lvalue INCR | lvalue DECR | INCR lvalue | DECR lvalue | lvalue POW_ASSIGN expr | lvalue MOD_ASSIGN expr | lvalue MUL_ASSIGN expr | lvalue DIV_ASSIGN expr | lvalue ADD_ASSIGN expr | lvalue SUB_ASSIGN expr | lvalue '=' expr | FUNC_NAME '(' expr_list_opt ')' /* no white space allowed */ | BUILTIN_FUNC_NAME '(' expr_list_opt ')' | BUILTIN_FUNC_NAME | non_unary_input_function ; print_expr_list_opt: /* empty */ | print_expr_list ; print_expr_list: print_expr | print_expr_list ',' newline_opt print_expr ; print_expr: unary_print_expr | non_unary_print_expr ; unary_print_expr: '+' print_expr | '-' print_expr | unary_print_expr '^' print_expr | unary_print_expr '*' print_expr | unary_print_expr '/' print_expr | unary_print_expr '%' print_expr | unary_print_expr '+' print_expr | unary_print_expr '-' print_expr | unary_print_expr non_unary_print_expr | unary_print_expr ' ' print_expr | unary_print_expr N~O_MATCH print_expr Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 344 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 | unary_print_expr In NAME | unary_print_expr AND newline_opt print_expr | unary_print_expr OR newline_opt print_expr | unary_print_expr '?' print_expr ':' print_expr ; non_unary_print_expr: '(' expr ')' | '!' print_expr | non_unary_print_expr '^' print_expr | non_unary_print_expr '*' print_expr | non_unary_print_expr '/' print_expr | non_unary_print_expr '%' print_expr | non_unary_print_expr '+' print_expr | non_unary_print_expr '-' print_expr | non_unary_print_expr non_unary_print_expr | non_unary_print_expr ' ' print_expr | non_unary_print_expr N~O_MATCH print_expr | non_unary_print_expr In NAME | '(' multiple_expr_list ')' In NAME | non_unary_print_expr AND newline_opt print_expr | non_unary_print_expr OR newline_opt print_expr | non_unary_print_expr '?' print_expr ':' print_expr | NUMBER | STRING | lvalue | ERE | lvalue INCR | lvalue DECR | INCR lvalue | DECR lvalue | lvalue POW_ASSIGN print_expr | lvalue MOD_ASSIGN print_expr | lvalue MUL_ASSIGN print_expr | lvalue DIV_ASSIGN print_expr | lvalue ADD_ASSIGN print_expr | lvalue SUB_ASSIGN print_expr | lvalue '=' print_expr | FUNC_NAME '(' expr_list_opt ')' /* no white space allowed */ | BUILTIN_FUNC_NAME '(' expr_list_opt ')' | BUILTIN_FUNC_NAME ; lvalue: NAME | NAME '[' expr_list ']' | '$' expr ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 345 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX non_unary_input_function: simple_get | simple_get '<' expr | non_unary_expr '|' simple_get ; unary_input_function: unary_expr '|' simple_get ; simple_get: GETLINE | GETLINE lvalue ; newline_opt: /* empty */ | newline_opt NEWLINE ; This grammar has several ambiguities that shall be resolved as follows: - Operator precedence and associativity shall be as described in Table 4-1. - In case of ambiguity, an else shall be associated with the most immediately preceding if that would satisfy the grammar. 4.1.7.8 awk Lexical Conventions The lexical conventions for awk programs, with respect to the preceding grammar, shall be as follows: (1) Except as noted, awk shall recognize the longest possible token or delimiter beginning at a given point. (2) A comment shall consist of any characters beginning with the number sign character and terminated by, but excluding the next occurrence of, a character. Comments shall have no effect, except to delimit lexical tokens. (3) The character shall be recognized as the token NEWLINE. (4) A backslash character immediately followed by a 1 character shall have no effect. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 346 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (5) The token STRING shall represent a string constant. A string constant shall begin with the character ". Within a string constant, a backslash character shall be considered to begin an escape sequence as specified in Table 2-15 (see 2.12). In addition, the escape sequences in Table 4-2 shall be recognized. A character shall not occur within a string constant. A string constant shall be terminated by the first unescaped occurrence of the character " after the one that begins the string constant. The value of the string shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting " characters. Table 4-2 - awk Escape Sequences __________________________________________________________________________________________________________________________________________________ Escape Sequence Description Meaning _____________________________________________________________ \" character \/ character \_d_d_d followed The character whose 111 by the longest encoding is represented 11 sequence of one, two, by the one-, two-, or 11 or three octal-digit three-digit octal 11 characters (01234567). integer. If the size of 11 If all of the digits a byte on the system is 11 are 0, (i.e., greater than nine bits, 11 representation of the the valid escape sequence 11 NUL character), the used to represent a byte 11 behavior is undefined. is implementation 11 defined. Multibyte 1 characters require 1 multiple, concatenated 1 escape sequences of this 1 type, including the 1 leading \ for each byte. 1 \_c followed Undefined by any character not described in this table or in Table 2-15 __________________________________________________________________________________________________________________________________________________ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 347 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (6) The token ERE represents an extended regular expression constant. An ERE constant shall begin with the slash character. Within an ERE constant, a character shall be considered to begin an escape sequence as specified in Table 2- 15 (see 2.12). In addition, the escape sequences in Table 4-2 1 shall be recognized. A character shall not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the slash character after the one that begins the string constant. The extended regular expression represented by the ERE constant shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting slash characters. (7) A shall have no effect, except to delimit lexical tokens or within STRING or ERE tokens. (8) The token NUMBER shall represent a numeric constant. Its form and numeric value shall be equivalent to the either of the tokens floating-constant or integer-constant as specified by the C Standard {7}, with the following exceptions: (a) An integer constant cannot begin with 0x or include the hexadecimal digits a, b, c, d, e, f, A, B, C, D, E, or F. (b) The value of an integer constant beginning with 0 shall be taken in decimal rather than octal. (c) An integer constant cannot include a suffix (u, U, l, or L). (d) A floating constant cannot include a suffix (f, F, l, or L). If the value is too large or too small to be representable (see 2.9.2.1), the behavior is undefined. (9) A sequence of underscores, digits, and alphabetics from the portable character set (see 2.4), beginning with an underscore or alphabetic, shall be considered a word. (10) The following words are keywords that shall be recognized as individual tokens; the name of the token is the same as the keyword: BEGIN delete for in printf END do function next return break else getline print while Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 348 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 continue exit if (11) The following words are names of built-in functions and shall be recognized as the token BUILTIN_FUNC_NAME: atan2 index match sprintf substr close int rand sqrt system cos length sin srand tolower exp log split sub toupper gsub The above-listed keywords and names of built-in functions are considered reserved words. (12) The token NAME shall consist of a word that is not a keyword or a name of a built-in function and is not followed immediately (without any delimiters) by the ( character. (13) The token FUNC_NAME shall consist of a word that is not a keyword or a name of a built-in function, followed immediately (without any delimiters) by the ( character. The ( character shall not be included as part of the token. (14) The following two-character sequences shall be recognized as the named tokens: Token Name Sequence Token Name Sequence __________ ________ __________ ________ ADD_ASSIGN += NO_MATCH !~ SUB_ASSIGN -= EQ == MUL_ASSIGN *= LE <= DIV_ASSIGN /= GE >= MOD_ASSIGN %= NE != POW_ASSIGN ^= INCR ++ OR || DECR -- AND && APPEND >> (15) The following single characters shall be recognized as tokens whose names are the character: { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ = There is a lexical ambiguity between the token ERE and the tokens / and DIV_ASSIGN. When an input sequence begins with a slash character in any syntactic context where the token / or DIV_ASSIGN could appear as the next token in a valid program, the longer of those two tokens that can be recognized shall be recognized. In any other syntactic context where the token ERE could appear as the next token in a valid program, the token ERE shall be recognized. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 349 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.1.8 Exit Status The awk utility shall exit with one of the following values: 0 All input files were processed successfully. >0 An error occurred. The exit status can be altered within the program by using an exit expression. 4.1.9 Consequences of Errors If any _f_i_l_e operand is specified and the named file cannot be accessed, awk shall write a diagnostic message to standard error and terminate without any further action. If the program specified by either the _p_r_o_g_r_a_m operand or the _p_r_o_g_f_i_l_e operand(s) is not a valid awk program (as specified in 4.1.7), the behavior is undefined. BEGIN_RATIONALE 4.1.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The awk program specified in the command line is most easily specified within single-quotes (e.g., '_p_r_o_g_r_a_m') for applications using sh, because awk programs commonly contain characters that are special to the shell, including double-quotes. In the cases where an awk program contains single-quote characters, it is usually easiest to specify most of the program as strings within single-quotes concatenated by the shell with quoted single-quote characters. For example, awk '/'\''/ { print "quote:", $0 }' prints all lines from the standard input containing a single-quote character, prefixed with quote:. The following are examples of simple awk programs: (1) Write to the standard output all input lines for which field 3 is greater than 5. $3 > 5 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 350 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (2) Write every tenth line. (NR % 10) == 0 (3) Write any line with a substring matching the regular expression. /(G|D)(2[0-9][[:alpha:]]*)/ (4) Write any line in which the second field matches the regular expression and the fourth field does not. $2 /xyz/ && $4 ! /xyz/ ~ ~ (5) Write any line in which the second field contains a backslash. $2 /\\/ ~ (6) Write any line in which the second field contains a backslash. Note that backslash escapes are interpreted twice, once in lexical processing of the string and once in processing the regular expression. $2 "\\\\" ~ (7) Write the second to the last and the last field in each line. Separate the fields by a colon. {OFS=":";print $(NF-1), $NF} (8) Write the line number and number of fields in each line. The three strings representing the line number, the colon and the number of fields are concatenated and that string is written to standard output. {print NR ":" NF} (9) Write lines longer than 72 characters. {length($0) > 72} (10) Write first two fields in opposite order separated by the OFS: { print $2, $1 } (11) Same, with input fields separated by comma and/or _s and _s: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 351 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX BEGIN { FS = ",[ \t]*|[ \t]+" } { print $2, $1 } (12) Add up first column, print sum and average. {s += $1 } END {print "sum is ", s, " average is", s/NR} (13) Write fields in reverse order, one per line (many lines out for each line in): { for (i = NF; i > 0; --i) print $i } (14) Write all lines between occurrences of the strings start and stop: /start/, /stop/ (15) Write all lines whose first field is different from the previous one: $1 != prev { print; prev = $1 } (16) Simulate echo: BEGIN { for (i = 1; i < ARGC; ++i) printf "%s%s", ARGV[i], i==ARGC-1?"\n":"" } (17) Write the path prefixes contained in the PATH environment variable, one per line: BEGIN { n = split (ENVIRON["PATH"], path, ":") for (i = 1; i <= n; ++i) print path[i] } (18) If there is a file named ``input'' containing page headers of the form: Page # and a file named ``program'' that contains: /Page/{ $2 = n++; } { print } Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 352 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 then the command line: awk -f program n=5 input will print the file ``input,'' filling in page numbers starting at 5. The index, length, match, and substr should not be confused with similar functions in the C Standard {7}; the awk versions deal with characters, while the C Standard {7} deals with bytes. To forestall any possible confusion, where strings are used as the name 1 of a file or pipeline, the strings must be textually identical. The 1 terminology ``same string value'' implies that ``equivalent strings,'' 1 even those that differ only by s, represent different files. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This description is based on the new awk, ``nawk,'' (see _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}), which introduced a number of new features to the historical awk: (1) New keywords: delete, do, function, return (2) New built-in functions: atan2, cos, sin, rand, srand, gsub, sub, match, close, system (3) New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH, SUBSEP (4) New expression operators: ?:, ^ (5) The FS variable and the third argument to split are now treated as extended regular expressions. (6) The operator precedence has changed to more closely match C. Two examples of code that operate differently are: while ( n /= 10 > 1) ... if (!"wk" /bwk/) ... ~ Several features have been added based on newer implementations of awk: (1) Multiple instances of -f _p_r_o_g_f_i_l_e are permitted. (2) New option: -v _a_s_s_i_g_n_m_e_n_t (3) New predefined variable: ENVIRON Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 353 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (4) New built-in functions: toupper, tolower (5) More formatting capabilities added to printf to match the C Standard {7}. Regular expressions have been extended somewhat from traditional implementations to make them a pure superset of Extended Regular Expressions as defined by this standard (see 2.8.4). The main extensions are internationalization features and interval expressions. Traditional implementations of awk have long supported escape sequences as an extension to regular expressions, and this extension has been retained despite inconsistency with other utilities. The number of escape sequences recognized in both regular expressions and strings has varied (generally increasing with time) among implementations. The set specified by the standard includes most sequences known to be supported by popular implementations and by the C Standard {7}. One sequence that is not supported is hexadecimal value escapes beginning with "\x". This would allow values expressed in more than 9 bits to be used within awk as in the C Standard {7}. However, because this syntax has a nondeterministic length, it does not permit the subsequent character to be a hexadecimal digit. This limitation can be worked around in the C language by the use of lexical string concatenation. In the awk language, concatenation could also be a solution for strings, but not for regular expressions (either lexical ERE tokens or strings used dynamically as regular expressions). Because of this limitation, the feature has not been added to POSIX.2. When a string variable is used in a context where an ERE normally appears 1 (where the lexical token ERE is used in the grammar) the string does not 1 contain the literal slashes. 1 Some versions of awk allow the form: func _n_a_m_e(_a_r_g_s,...) { _s_t_a_t_e_m_e_n_t_s } This has been deprecated by the language's authors, who have asked that it not be included in the standard. Traditional implementations of awk produce an error if a next statement is executed in a BEGIN action, and cause awk to terminate if a next statement is executed in an END action. This behavior has not been documented, and it was not believed that it was necessary to standardize it. The specification of conversions between string and numeric values is much more detailed than in the documentation of traditional implementations or in _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}. Although most of the behavior is designed to be intuitive, the details are necessary to ensure compatible behavior from different implementations. This is Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 354 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 especially important in relational expressions, since the types of the operands determine whether a string or numeric comparison is performed. From the perspective of an application writer, it is usually sufficient to expect intuitive behavior and to force conversions (by adding zero or concatenating a null string) when the type of an expression does not obviously match what is needed. The intent has been to specify existing practice in almost all cases. The one exception is that, in traditional implementations, variables and constants maintain both string and numeric values after their original value is converted by any use. This means that referencing a variable or constant can have unexpected side effects. For example, with traditional implementations the following program: { a = "+2" b = 2 if (NR % 2) c = a + b if (a == b) print "numeric comparison" else print "string comparison" } would perform a numeric comparison (and output numeric comparison) for each odd-numbered line, but perform a string comparison (and output string comparison) for each even-numbered line. POSIX.2 ensures that 1 comparisons will be numeric if necessary. With traditional 1 implementations, the following program: BEGIN { OFMT = "%e" print 3.14 OFMT = "%f" print 3.14 } would output 3.140000e+00 twice, because in the second print statement the constant 3.14 would have a string value from the previous conversion. The standard requires that the output of the second print statement be 3.140000. The behavior of traditional implementations was seen as too unintuitive and unpredictable. However, a further modification was made in Draft 11. It was pointed out that with the Draft 10 rules, the following script would print nothing: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 355 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX BEGIN { y[1.5] = 1 OFMT = "%e" print y[1.5] } Therefore, a new variable, CONVFMT, was introduced. The OFMT variable is now restricted to affecting output conversions of numbers to strings and CONVFMT is used for internal conversions, such as comparisons or array indexing. The default value is the same as that for OFMT, so unless a program changes CONVFMT (which no historical program would do), it will receive the historical behavior associated with internal string conversions. The POSIX awk lexical and syntactic conventions are specified more formally than in other sources. Again the intent has been to specify existing practice. One convention that may not be obvious from the formal grammar as in other verbal descriptions is where _s are acceptable. There are several obvious placements such as terminating a statement, and a backslash can be used to escape _s between any lexical tokens. In addition, _s without backslashes can follow a comma, an open brace, logical AND operator (&&), _l_o_g_i_c_a_l _O_R _o_p_e_r_a_t_o_r (||), the do keyword, the else keyword, and the closing parenthesis of an if, for, or while statement. For example: { print $1, $2 } The requirement that awk add a trailing to the _p_r_o_g_r_a_m argument text is to simplify the grammar, making it match a text file in form. There is no way for an application or test suite to determine whether a literal is added or whether awk simply acts as if it did. Because the concatenation operation is represented by adjacent expressions rather than an explicit operator, it is often necessary to use parentheses to enforce the proper evaluation precedence. The overall awk syntax has always been based on the C language, with a few features from the shell command language and other sources. Because of this, it is not completely compatible with any other language, which has caused confusion for some users. It is not the intent of this standard to address such issues. The standard has made a few relatively minor changes toward making the language more compatible with the C language as specified by the C Standard {7}; most of these changes are based on similar changes in recent implementations, as described above. There remain several C language conventions that are not in _a_w_k. One of the notable ones is the comma operator, which is commonly used to specify multiple expressions in the C language for statement. Also, there are various places where awk is more restrictive than the C language Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 356 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 regarding the type of expression that can be used in a given context. These limitations are due to the different features that the awk language does provide. This standard requires several changes from traditional implementations in order to support internationalization. Probably the most subtle of these is the use of the decimal-point character, defined by the LC_NUMERIC category of the locale, in representations of floating point numbers. This locale-specific character is used in recognizing numeric input, in converting between strings and numeric values, and in formatting output. However, regardless of locale, the period character (the decimal-point character of the POSIX Locale) is the decimal-point character recognized in processing awk programs (including assignments in command-line arguments). This is essentially the same convention as the one used in the C Standard {7}. The difference is that the C language includes the _s_e_t_l_o_c_a_l_e() function, which permits an application to modify its locale. Because of this capability, a C application begins executing with its locale set to the C locale, and only executes in the environment-specified locale after an explicit call to _s_e_t_l_o_c_a_l_e(). However, adding such an elaborate new feature to the awk language was seen as inappropriate for POSIX.2. It is possible to explicitly execute an awk program in any desired locale by setting the environment in the shell. The behavior in the case of invalid awk programs (including lexical, syntactic, and semantic errors) is undefined because it was considered overly limiting on implementations to specify. In most cases such errors can be expected to produce a diagnostic and a nonzero exit status. However, some implementations may choose to extend the language in ways that make use of certain invalid constructs. Other invalid constructs might be deemed worthy of a warning but otherwise cause some reasonable behavior. Still other constructs may be very difficult to detect in some implementations. Also, different implementations might detect a given error during an initial parsing of the program (before reading any input files) while others might detect it when executing the program after reading some input. Implementors should be aware that diagnosing errors as early as possible and producing useful diagnostics can ease debugging of applications, and thus make an implementation more usable. The unspecified behavior from using multicharacter RS values is to allow possible future extensions based on regular expressions used for record separators. Historical implementations take the first character of the string and ignore the others. The undefined behavior resulting from NULs in regular expressions allows future extensions for the GNU gawk program to process binary data. Unspecified behavior when split(string,array,) is used is to allow a proposed future extension that would split up a string into an array of Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.1 awk - Pattern scanning and processing language 357 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX individual characters. END_RATIONALE 4.2 basename - Return nondirectory portion of pathname 4.2.1 Synopsis basename _s_t_r_i_n_g [_s_u_f_f_i_x] 4.2.2 Description The _s_t_r_i_n_g operand shall be treated as a pathname, as defined in 2.2.2.102. The string _s_t_r_i_n_g shall be converted to the filename corresponding to the last pathname component in _s_t_r_i_n_g and then the suffix string _s_u_f_f_i_x, if present, shall be removed. This shall be done by performing actions equivalent to the following steps in order: (1) If _s_t_r_i_n_g is //, it is implementation defined whether steps (2) through (5) are skipped or processed. (2) If _s_t_r_i_n_g consists entirely of slash characters, _s_t_r_i_n_g shall be set to a single slash character. In this case, skip steps (3) through (5). (3) If there are any trailing slash characters in _s_t_r_i_n_g, they shall be removed. (4) If there are any slash characters remaining in _s_t_r_i_n_g, the prefix of _s_t_r_i_n_g up to and including the last slash character in _s_t_r_i_n_g shall be removed. (5) If the _s_u_f_f_i_x operand is present, is not identical to the characters remaining in _s_t_r_i_n_g, and is identical to a suffix of the characters remaining in _s_t_r_i_n_g, the suffix _s_u_f_f_i_x shall be removed from _s_t_r_i_n_g. Otherwise, _s_t_r_i_n_g shall not be modified by this step. It shall not be considered an error if _s_u_f_f_i_x is not found in _s_t_r_i_n_g. The resulting string shall be written to standard output. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 358 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.2.3 Options None. 4.2.4 Operands The following operands shall be supported by the implementation: _s_t_r_i_n_g A string. _s_u_f_f_i_x A string. 4.2.5 External Influences 4.2.5.1 Standard Input None. 4.2.5.2 Input Files None. 4.2.5.3 Environment Variables The following environment variables shall affect the execution of basename: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.2 basename - Return nondirectory portion of pathname 359 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.2.5.4 Asynchronous Events Default. 4.2.6 External Effects 4.2.6.1 Standard Output The basename utility shall write a line to the standard output in the following format: "%s\n", <_r_e_s_u_l_t_i_n_g _s_t_r_i_n_g> 4.2.6.2 Standard Error Used only for diagnostic messages. 4.2.6.3 Output Files None. 4.2.7 Extended Description None. 4.2.8 Exit Status The basename utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.2.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 360 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.2.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If the string _s_t_r_i_n_g is a valid pathname, $(basename "string") produces a filename that could be used to open the file named by _s_t_r_i_n_g in the directory returned by $(dirname "string") If the string _s_t_r_i_n_g is not a valid pathname, the same algorithm is used, but the result need not be a valid filename. The basename utility is not expected to make any judgements about the validity of _s_t_r_i_n_g as a pathname; it just follows the specified algorithm to produce a result string. The following shell script compiles /usr/src/cmd/cat.c and moves the output to a file named cat in the current directory when invoked with the argument /usr/src/cmd/cat or with the argument /usr/src/cmd/cat.c: c89 $(dirname "$1")/$(basename "$1" .c).c mv a.out $(basename "$1" .c) _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The POSIX.1 {8} definition of pathname allows trailing slashes on a pathname naming a directory. Some historical implementations have not allowed trailing slashes and thus treated pathnames of this form in other ways. Existing implementations also differ in their handling of _s_u_f_f_i_x when _s_u_f_f_i_x matches the entire string left after removing the directory part of _s_t_r_i_n_g. The behaviors of basename and dirname in this standard have been coordinated so that when _s_t_r_i_n_g is a valid pathname $(basename "string") would be a valid filename for the file in the directory $(dirname "string") This would not work for the versions of these utilities in earlier drafts due to the way it specified handling of trailing slashes. Since the definition of _p_a_t_h_n_a_m_e in 2.2.2.102 specifies implementation- defined behavior for pathnames starting with two slash characters, Draft Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.2 basename - Return nondirectory portion of pathname 361 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 11 has been changed to specify similar implementation-defined behavior for the basename and dirname utilities. On implementations where the pathname // is always treated the same as the pathname /, the functionality required by Draft 10 meets all of the Draft 11 requirements. END_RATIONALE 4.3 bc - Arbitrary-precision arithmetic language 4.3.1 Synopsis bc [-l] [_f_i_l_e ...] 4.3.2 Description The bc utility shall implement an arbitrary precision calculator. It shall take input from any files given, then read from the standard input. If the standard input and standard output to bc are attached to a terminal, the invocation of bc shall be considered to be _i_n_t_e_r_a_c_t_i_v_e, causing behavioral constraints described in the following subclauses. 4.3.3 Options The bc utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -l (The letter ell.) Define the math functions and initialize scale to 20, instead of the default zero. See 4.3.7. 4.3.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a text file containing bc program statements. After all _f_i_l_es have been read, bc shall read the standard input. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 362 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.3.5 External Influences 4.3.5.1 Standard Input See Input Files. 4.3.5.2 Input Files Input files shall be text files containing a sequence of comments, statements, and function definitions that shall be executed as they are read. 4.3.5.3 Environment Variables The following environment variables shall affect the execution of bc: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.3.5.4 Asynchronous Events Default. 4.3.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 363 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.3.6.1 Standard Output The output of the bc utility shall be controlled by the program read, and shall consist of zero or more lines containing the value of all executed 2 expressions without assignments. The radix and precision of the output 2 shall be controlled by the values of the obase and scale variables. See 4.3.7. 4.3.6.2 Standard Error Used only for diagnostic messages. 4.3.6.3 Output Files None. 4.3.7 Extended Description 4.3.7.1 bc Grammar The grammar in this subclause and the lexical conventions in the following subclause shall together describe the syntax for bc programs. The general conventions for this style of grammar are described in 2.1.2. A valid program can be represented as the nonterminal symbol program in the grammar. Any discrepancies found between this grammar and other descriptions in this subclause (4.3.7) shall be resolved in favor of this grammar. %token EOF NEWLINE STRING LETTER NUMBER %token MUL_OP /* '*', '/', '%' */ %token ASSIGN_OP /* '=', '+=', '-=', '*=', '/=', '%=', '^=' */ %token REL_OP /* '==', '<=', '>=', '!=', '<', '>' */ %token INCR_DECR /* '++', '--' */ %token Define Break Quit Length /* 'define', 'break', 'quit', 'length' */ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 364 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 %token Return For If While Sqrt /* 'return', 'for', 'if', 'while', 'sqrt' */ %token Scale Ibase Obase Auto /* 'scale', 'ibase', 'obase', 'auto' */ %start program %% program : EOF | input_item program ; input_item : semicolon_list NEWLINE | function ; semicolon_list : /* empty */ | statement | semicolon_list ';' statement | semicolon_list ';' ; statement_list : /* empty */ | statement | statement_list NEWLINE | statement_list NEWLINE statement | statement_list ';' | statement_list ';' statement ; statement : expression | STRING | Break | Quit | Return | Return '(' return_expression ')' | For '(' expression ';' relational_expression ';' expression ')' statement | If '(' relational_expression ')' statement | While '(' relational_expression ')' statement | '{' statement_list '}' ; function : Define LETTER '(' opt_parameter_list ')' Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 365 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX '{' NEWLINE opt_auto_define_list statement_list '}' ; opt_parameter_list : /* empty */ | parameter_list ; parameter_list : LETTER | define_list ',' LETTER ; opt_auto_define_list : /* empty */ | Auto define_list NEWLINE | Auto define_list ';' ; define_list : LETTER | LETTER '[' ']' | define_list ',' LETTER | define_list ',' LETTER '[' ']' ; opt_argument_list : /* empty */ | argument_list ; argument_list : expression | argument_list ',' expression ; relational_expression : expression | expression REL_OP expression ; return_expression : /* empty */ | expression ; expression : named_expression | NUMBER | '(' expression ')' | LETTER '(' opt_argument_list ')' | '-' expression | expression '+' expression 1 | expression '-' expression 1 | expression MUL_OP expression Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 366 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 | expression '^' expression | INCR_DECR named_expression | named_expression INCR_DECR | named_expression ASSIGN_OP expression | Length '(' expression ')' | Sqrt '(' expression ')' | Scale '(' expression ')' ; named_expression : LETTER | LETTER '[' expression ']' | Scale | Ibase | Obase ; 4.3.7.2 bc Lexical Conventions The lexical conventions for bc programs, with respect to the preceding grammar, shall be as follows: (1) Except as noted, bc shall recognize the longest possible token or delimiter beginning at a given point. (2) A comment shall consist of any characters beginning with the two adjacent characters /* and terminated by the next occurrence of the two adjacent characters */. Comments shall have no effect except to delimit lexical tokens. (3) The character shall be recognized as the token NEWLINE. (4) The token STRING shall represent a string constant; it shall consist of any characters beginning with the double-quote character (") and terminated by another occurrence of the double-quote character. The value of the string shall be the sequence of all characters between, but not including, the two double-quote characters. All characters shall be taken literally from the input, and there is no way to specify a string containing a double-quote character. The length of the value of each string shall be limited to {BC_STRING_MAX} bytes. (5) A shall have no effect except as an ordinary character 1 if it appears within a STRING token, or to delimit a lexical 1 token other than STRING. 1 (6) The combination of a backslash character immediately followed by 2 a character shall delimit lexical tokens with the 2 following exceptions: 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 367 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - It shall be interpreted as a literal in STRING 2 tokens. 2 - It shall be ignored as part of a multiline NUMBER token. 2 (7) The token NUMBER shall represent a numeric constant. It shall be recognized by the following grammar: NUMBER : integer | '.' integer | integer '.' | integer '.' integer ; integer : digit | integer digit ; digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F ; (8) The value of a NUMBER token shall be interpreted as a numeral in the base specified by the value of the internal register ibase (described below). Each of the digit characters shall have the value from 0 to 15 in the order listed here, and the period character shall represent the radix point. The behavior is undefined if digits greater than or equal to the value of ibase appear in the token. (However, note the exception for single- digit values being assigned to ibase and obase themselves, in 4.3.7.3). (9) The following keywords shall be recognized as tokens: auto for length return sqrt break ibase obase scale while define if quit (10) Any of the following characters occurring anywhere except within a keyword shall be recognized as the token LETTER: a b c d e f g h i j k l m n o p q r s t u v w x y z (11) The following single-character and two-character sequences shall be recognized as the token ASSIGN_OP: = += -= *= /= %= ^= Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 368 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (12) If an = character, as the beginning of a token, is followed by a - character with no intervening delimiter, the behavior is undefined. (13) The following single-characters shall be recognized as the token MUL_OP: * / % (14) The following single-character and two-character sequences shall be recognized as the token REL_OP: == <= >= != < > (15) The following two-character sequences shall be recognized as the token INCR_DECR: ++ -- (16) The following single characters shall be recognized as tokens whose names are the character: ( ) , + - ; [ ] ^ { } 1 (17) The token EOF shall be returned when the end of input is reached. 4.3.7.3 bc Operations There are three kinds of identifiers: ordinary identifiers, array identifiers, and function identifiers. All three types consist of single lowercase letters. Array identifiers shall be followed by square brackets ([ ]). An array subscript is required except in an argument or auto list. Arrays are singly dimensioned and can contain up to {BC_DIM_MAX} elements. Indexing begins at zero so an array is indexed from 0 to {BC_DIM_MAX}-1. Subscripts shall be truncated to integers. Function identifiers shall be followed by parentheses, possibly enclosing arguments. The three types of identifiers do not conflict. Table 4-3 summarizes the rules for precedence and associativity of all operators. Operators on the same line shall have the same precedence; rows are in order of decreasing precedence. Each expression or named expression has a _s_c_a_l_e, which is the number of decimal digits that shall be maintained as the fractional portion of the expression. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 369 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 4-3 - bc Operators __________________________________________________________________________________________________________________________________________________ Operator Associativity ____________________________________________________________ ++, -- not applicable unary - not applicable ^ right to left *, /, % left to right +, binary - left to right =, +=, -=, *=, /=, %=, ^= right to left ==, <=, >=, !=, <, > none __________________________________________________________________________________________________________________________________________________ _N_a_m_e_d _e_x_p_r_e_s_s_i_o_n_s are places where values are stored. Named expressions shall be valid on the left side of an assignment. The value of a named expression shall be the value stored in the place named. Simple identifiers and array elements shall be named expressions; they shall have an initial value of zero and an initial scale of zero. The internal registers scale, _i_b_a_s_e, and obase are all named expressions. The scale of an expression consisting of the name of one of these registers shall be zero; values assigned to any of these registers shall be truncated to integers. The scale register shall contain a global value used in computing the scale of expressions (as described below). The value of the register scale shall be limited to 0 _< scale _< {BC_SCALE_MAX} and shall have a default value of zero. The ibase and obase registers are the input and output number radix, respectively. The value of ibase shall be limited to 2 _< ibase _< 16 The value of obase shall be limited to 2 _< obase _< {BC_BASE_MAX} When either ibase or obase is assigned a single digit value from the list in 4.3.7.2, the value shall be assumed in hexadecimal. (For example, ibase=A sets to base ten, regardless of the current ibase value.) Otherwise, the behavior is undefined when digits greater than or equal to the value of ibase appear in the input. Both ibase and obase shall have initial values of 10. Internal computations shall be conducted as if in decimal, regardless of 1 the input and output bases, to the specified number of decimal digits. When an exact result is not achieved, (e.g., scale=0; 3.2/1) the result shall be truncated. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 370 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 For all values of obase specified by this standard, numerical values shall be output as follows: (1) If the value is less than zero, a hyphen (-) character shall be output. (2) One of the following shall be output, depending on the numerical value: - If the absolute value of the numerical value is greater than or equal to one, the integer portion of the value shall be output as a series of digits appropriate to obase (as described below). The most significant nonzero digit shall be output next, followed by each successively less significant digit. - If the absolute value of the numerical value is less than one but greater than zero and the scale of the numerical value is greater than zero, it is unspecified whether the character 0 is output. - If the numerical value is zero, the character 0 shall be output. (3) If the scale of the value is greater than zero, a period character shall be output, followed by a series of digits appropriate to obase (as described below) representing the most significant portion of the fractional part of the value. If _s represents the scale of the value being output, the number of digits output shall be _s if obase is 10, less than or equal to _s if obase is greater than 10, or greater than or equal to _s if obase is less than 10. For obase values other than 10, this should be the number of digits needed to represent a precision of 10_s. For obase values from 2 to 16, valid digits are the first obase of the single characters 0 1 2 3 4 5 6 7 8 9 A B C D E F which represent the values zero through fifteen, respectively. For bases greater than 16, each ``digit'' shall be written as a separate multidigit decimal number. Each digit except the most significant fractional digit shall be preceded a single character. For bases from 17 to 100, bc shall write two-digit decimal numbers; for bases from 101 to 999, three-digit decimal strings, and so on. For example, the decimal number 1024 in base 25 would be written as: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 371 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX W01W15W24 in base 125, as: W008W024 Very large numbers shall be split across lines with 70 characters per line in the POSIX Locale; other locales may split at different character boundaries. Lines that are continued shall end with a backslash (\). A function call shall consist of a function name followed by parentheses containing a comma-separated list of expressions, which are the function arguments. A whole array passed as an argument shall be specified by the array name followed by empty square brackets. All function arguments shall be passed by value. As a result, changes made to the formal parameters have no effect on the actual arguments. If the function terminates by executing a return statement, the value of the function shall be the value of the expression in the parentheses of the return statement or shall be zero if no expression is provided or if there is no return statement. The result of sqrt(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _s_q_u_a_r_e _r_o_o_t _o_f _t_h_e _e_x_p_r_e_s_s_i_o_n. _T_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _t_r_u_n_c_a_t_e_d _i_n _t_h_e _l_e_a_s_t _s_i_g_n_i_f_i_c_a_n_t _d_e_c_i_m_a_l _p_l_a_c_e. _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _t_h_e _s_c_a_l_e _o_f _t_h_e _e_x_p_r_e_s_s_i_o_n _o_r _t_h_e _v_a_l_u_e _o_f _s_c_a_l_e, whichever is larger. The result of length(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _t_o_t_a_l _n_u_m_b_e_r _o_f _s_i_g_n_i_f_i_c_a_n_t _d_e_c_i_m_a_l _d_i_g_i_t_s _i_n _t_h_e _e_x_p_r_e_s_s_i_o_n. _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _z_e_r_o. _T_h_e _r_e_s_u_l_t _o_f _s_c_a_l_e(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _s_c_a_l_e _o_f _t_h_e _e_x_p_r_e_s_s_i_o_n. _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _z_e_r_o. _A _n_u_m_e_r_i_c _c_o_n_s_t_a_n_t _s_h_a_l_l _b_e _a_n _e_x_p_r_e_s_s_i_o_n. _T_h_e _s_c_a_l_e _s_h_a_l_l _b_e _t_h_e _n_u_m_b_e_r _o_f _d_i_g_i_t_s _t_h_a_t _f_o_l_l_o_w _t_h_e _r_a_d_i_x _p_o_i_n_t _i_n _t_h_e _i_n_p_u_t _r_e_p_r_e_s_e_n_t_i_n_g _t_h_e _c_o_n_s_t_a_n_t, _o_r _z_e_r_o _i_f _n_o _r_a_d_i_x _p_o_i_n_t _a_p_p_e_a_r_s. _T_h_e _s_e_q_u_e_n_c_e ( _e_x_p_r_e_s_s_i_o_n ) _s_h_a_l_l _b_e _a_n _e_x_p_r_e_s_s_i_o_n _w_i_t_h _t_h_e _s_a_m_e _v_a_l_u_e _a_n_d _s_c_a_l_e _a_s _e_x_p_r_e_s_s_i_o_n. The parentheses can be used to alter the normal precedence. The semantics of the unary and binary operators are as follows. -_e_x_p_r_e_s_s_i_o_n The result shall be the negative of the _e_x_p_r_e_s_s_i_o_n. The scale of the result shall be the scale of _e_x_p_r_e_s_s_i_o_n. The unary increment and decrement operators shall not modify the scale of the named expression upon which they operate. The scale of the result shall be the scale of that named expression. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 372 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 ++_n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n The named expression shall be incremented by one. The result shall be the value of the named expression after incrementing. --_n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n The named expression shall be decremented by one. The result shall be the value of the named expression after decrementing. _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n++ The named expression shall be incremented by one. The result shall be the value of the named expression before incrementing. _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n-- The named expression shall be decremented by one. The result shall be the value of the named expression before decrementing. The exponentiation operator, circumflex (^), shall bind right to left. _e_x_p_r_e_s_s_i_o_n^_e_x_p_r_e_s_s_i_o_n The result shall be the first _e_x_p_r_e_s_s_i_o_n raised to the power of the second _e_x_p_r_e_s_s_i_o_n. If the second expression is not an integer, the behavior is undefined. If a is the scale of the left expression and b is the absolute value of the right expression, the scale of the result shall be: if b >= 0 min(a * b, max(scale, a)) 2 if b < 0 scale 2 The multiplicative operators (*, /, %) shall bind left to right. _e_x_p_r_e_s_s_i_o_n * _e_x_p_r_e_s_s_i_o_n The result shall be the product of the two expressions. If a and b are the scales of the two expressions, then the scale of the result shall be: min(a+b,max(scale,a,b)) _e_x_p_r_e_s_s_i_o_n / _e_x_p_r_e_s_s_i_o_n The result shall be the quotient of the two expressions. The scale of the result shall be the value of scale. _e_x_p_r_e_s_s_i_o_n % _e_x_p_r_e_s_s_i_o_n _F_o_r _e_x_p_r_e_s_s_i_o_n_s _a and _b, a % b shall be evaluated equivalent to the steps: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 373 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (1) Compute a/b to current scale. (2) Use the result to compute a - (a / b) * b to scale max(scale + scale(b), scale(a)) The scale of the result shall be max(scale + scale(b), scale(a)) The additive operators (+, -) shall bind left to right. _e_x_p_r_e_s_s_i_o_n + _e_x_p_r_e_s_s_i_o_n The result shall be the sum of the two expressions. The scale of the result shall be the maximum of the scales of the expressions. _e_x_p_r_e_s_s_i_o_n - _e_x_p_r_e_s_s_i_o_n The result shall be the difference of the two expressions. The scale of the result shall be the maximum of the scales of the expressions. The assignment operators (=, +=, -=, *=, /=, %=, ^=) shall bind right to left. _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n = _e_x_p_r_e_s_s_i_o_n This expression results in assigning the value of the expression on the right to the named expression on the left. The scale of both the named expression and the result shall be the scale of _e_x_p_r_e_s_s_i_o_n. The compound assignments forms _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n <_o_p_e_r_a_t_o_r>= _e_x_p_r_e_s_s_i_o_n shall be equivalent to: _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n = _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n <_o_p_e_r_a_t_o_r> _e_x_p_r_e_s_s_i_o_n except that the _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n shall be evaluated only once. Unlike all other operators, the relational operators (<, >, <=, >=, ==, !=) shall be only valid as the object of an if, while, or inside a for statement. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 374 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _e_x_p_r_e_s_s_i_o_n_1 < _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is strictly less than the value of _e_x_p_r_e_s_s_i_o_n_2. _e_x_p_r_e_s_s_i_o_n_1 > _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is strictly greater than the value of _e_x_p_r_e_s_s_i_o_n_2. _e_x_p_r_e_s_s_i_o_n_1 <= _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is less than or equal to the value of _e_x_p_r_e_s_s_i_o_n_2. _e_x_p_r_e_s_s_i_o_n_1 >= _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is greater than or equal to the value of _e_x_p_r_e_s_s_i_o_n_2. _e_x_p_r_e_s_s_i_o_n_1 == _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the values of _e_x_p_r_e_s_s_i_o_n_1 and _e_x_p_r_e_s_s_i_o_n_2 are equal. _e_x_p_r_e_s_s_i_o_n_1 != _e_x_p_r_e_s_s_i_o_n_2 The relation shall be true if the values of _e_x_p_r_e_s_s_i_o_n_1 and _e_x_p_r_e_s_s_i_o_n_2 are unequal. There are only two storage classes in bc, global and automatic (local). Only identifiers that are to be local to a function need be declared with the auto command. The arguments to a function shall be local to the function. All other identifiers are assumed to be global and available to all functions. All identifiers, global and local, have initial values of zero. Identifiers declared as auto shall be allocated on entry to the function and released on returning from the function. They therefore do not retain values between function calls. Auto arrays shall be specified by the array name followed by empty square brackets. On entry to a function, the old values of the names that appear as parameters and as automatic variables are pushed onto a stack. Until return is made from the function, reference to these names refers only to the new values. References to any of these names from other functions that are called from this function also refer to the new value until one of those functions uses the same name for a local variable. When a statement is an expression, unless the main operator is an assignment, execution of the statement shall write the value of the expression followed by a character. When a statement is a string, execution of the statement shall write the value of the string. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 375 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Statements separated by semicolon or shall be executed sequentially. In an interactive invocation of bc, each time a character is read that satisfies the grammatical production input_item : semicolon_list NEWLINE the sequential list of statements making up the semicolon_list shall be executed immediately and any output produced by that execution shall be written without any delay due to buffering. In an if statement [if (_r_e_l_a_t_i_o_n) _s_t_a_t_e_m_e_n_t] the _s_t_a_t_e_m_e_n_t shall be executed if the relation is true. The while statement [while (_r_e_l_a_t_i_o_n) _s_t_a_t_e_m_e_n_t] implements a loop in which the _r_e_l_a_t_i_o_n is tested; each time the _r_e_l_a_t_i_o_n is true, the _s_t_a_t_e_m_e_n_t shall be executed and the _r_e_l_a_t_i_o_n retested. When the _r_e_l_a_t_i_o_n is false, execution shall resume after _s_t_a_t_e_m_e_n_t. A for statement [for (_e_x_p_r_e_s_s_i_o_n; _r_e_l_a_t_i_o_n; _e_x_p_r_e_s_s_i_o_n) _s_t_a_t_e_m_e_n_t] shall be the same as: _f_i_r_s_t-_e_x_p_r_e_s_s_i_o_n while (_r_e_l_a_t_i_o_n) { _s_t_a_t_e_m_e_n_t _l_a_s_t-_e_x_p_r_e_s_s_i_o_n } All three expressions shall be present. The break statement causes termination of a for or while statement. The auto statement [auto _i_d_e_n_t_i_f_i_e_r[,_i_d_e_n_t_i_f_i_e_r] ...] _s_h_a_l_l _c_a_u_s_e _t_h_e _v_a_l_u_e_s _o_f _t_h_e _i_d_e_n_t_i_f_i_e_r_s _t_o _b_e _p_u_s_h_e_d _d_o_w_n. _T_h_e _i_d_e_n_t_i_f_i_e_r_s _c_a_n _b_e _o_r_d_i_n_a_r_y _i_d_e_n_t_i_f_i_e_r_s _o_r _a_r_r_a_y _i_d_e_n_t_i_f_i_e_r_s. _A_r_r_a_y _i_d_e_n_t_i_f_i_e_r_s _s_h_a_l_l _b_e _s_p_e_c_i_f_i_e_d _b_y _f_o_l_l_o_w_i_n_g _t_h_e _a_r_r_a_y _n_a_m_e _b_y _e_m_p_t_y _s_q_u_a_r_e _b_r_a_c_k_e_t_s. _T_h_e _a_u_t_o statement shall be the first statement in a function definition. A define statement: define _L_E_T_T_E_R ( _o_p_t__p_a_r_a_m_e_t_e_r__l_i_s_t ) { _o_p_t__a_u_t_o__d_e_f_i_n_e__l_i_s_t _s_t_a_t_e_m_e_n_t__l_i_s_t } defines a function named _L_E_T_T_E_R. If a function named _L_E_T_T_E_R was previously defined, the define statement shall replace the previous definition. The expression Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 376 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _L_E_T_T_E_R ( _o_p_t__a_r_g_u_m_e_n_t__l_i_s_t ) shall invoke the function named _L_E_T_T_E_R. The behavior is undefined if the number of arguments in the invocation does not match the number of parameters in the definition. Functions shall be defined before they are invoked. A function shall be considered to be defined within its own body, so recursive calls shall be valid. The values of numeric constants within a function shall be interpreted in the base specified by the value of the ibase register when the function is invoked. The return statements [return and return(_e_x_p_r_e_s_s_i_o_n)] shall cause termination of a function, popping of its auto variables, and specifies the result of the function. The first form shall be equivalent to return(0). The value and scale of an invocation of the function shall be the value and scale of the expression in parentheses. The quit statement (quit) _s_h_a_l_l _s_t_o_p _e_x_e_c_u_t_i_o_n _o_f _a _b_c program at the point where the statement occurs in the input, even if it occurs in a function definition, or in an if, for, or while statement. The following functions shall be defined when the -l option is specified: s ( _E_x_p_r_e_s_s_i_o_n ) Sine of argument in radians c ( _E_x_p_r_e_s_s_i_o_n ) _C_o_s_i_n_e _o_f _a_r_g_u_m_e_n_t _i_n _r_a_d_i_a_n_s _a ( _E_x_p_r_e_s_s_i_o_n ) _A_r_c_t_a_n_g_e_n_t _o_f _a_r_g_u_m_e_n_t _l ( _E_x_p_r_e_s_s_i_o_n ) _N_a_t_u_r_a_l _l_o_g_a_r_i_t_h_m _o_f _a_r_g_u_m_e_n_t _e ( _E_x_p_r_e_s_s_i_o_n ) _E_x_p_o_n_e_n_t_i_a_l _f_u_n_c_t_i_o_n _o_f _a_r_g_u_m_e_n_t _j ( _E_x_p_r_e_s_s_i_o_n , _E_x_p_r_e_s_s_i_o_n ) _B_e_s_s_e_l _f_u_n_c_t_i_o_n _o_f _i_n_t_e_g_e_r _o_r_d_e_r _T_h_e _s_c_a_l_e _o_f _a_n _i_n_v_o_c_a_t_i_o_n _o_f _e_a_c_h _o_f _t_h_e_s_e _f_u_n_c_t_i_o_n_s _s_h_a_l_l _b_e _t_h_e _v_a_l_u_e _o_f _t_h_e _s_c_a_l_e register when the function is invoked. The behavior is undefined if any of these functions is invoked with an argument outside the domain of the mathematical function. 4.3.8 Exit Status The bc utility shall exit with one of the following values: 0 All input files were processed successfully. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 377 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _u_n_s_p_e_c_i_f_i_e_d An error occurred. 4.3.9 Consequences of Errors If any _f_i_l_e operand is specified and the named file cannot be accessed, bc shall write a diagnostic message to standard error and terminate without any further action. In an interactive invocation of bc, the utility should print an error message and recover following any error in the input. In a noninteractive invocation of bc, invalid input causes undefined behavior. BEGIN_RATIONALE 4.3.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This description is based on _B_C--_A_n _A_r_b_i_t_r_a_r_y _P_r_e_c_i_s_i_o_n _D_e_s_k-_C_a_l_c_u_l_a_t_o_r _L_a_n_g_u_a_g_e by Lorinda Cherry and Robert Morris, in the BSD User Manual {B28}. Automatic variables in bc do not work in exactly the same way as in either C or PL/1. In the shell, the following assigns an approximation of the first ten digits of J to the variable _x: x=$(printf "%s\n" 'scale = 10; 104348/33215' | bc) The following bc program prints the same approximation of J, with a label, to standard output: scale = 10 "pi equals " 104348 / 33215 The following defines a function to compute an approximate value of the exponential function (note that such a function is predefined if the -l option is specified): Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 378 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 scale = 20 define e(x){ auto a, b, c, i, s a = 1 b = 1 s = 1 for (i = 1; 1 == 1; i++){ a = a*x b = b*i c = a/b if (c == 0) { return(s) } s = s+c } } The following prints approximate values of the exponential function of the first ten integers: for (i = 1; i <= 10; ++i) { e(i) } _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The bc utility is traditionally implemented as a front-end processor for dc; dc was not selected to be part of the standard because bc was thought to have a more intuitive programmatic interface. Current implementations that implement bc using dc are expected to be compliant. The Exit Status for error conditions been left unspecified for several reasons: (1) The bc utility is used in both interactive and noninteractive situations. Different exit codes may be appropriate for the two uses. (2) It is unclear when a nonzero exit should be given; divide-by- zero, undefined functions, and syntax errors are all possibilities. (3) It is not clear what utility the exit status has. (4) In the 4.3BSD, System V, and Ninth Edition implementations, bc works in conjunction with dc. dc is the parent, bc is the child. This was done to cleanly terminate bc if dc aborted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 379 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The decision to have bc exit upon encountering an inaccessible input file is based on the belief that bc _f_i_l_e_1 _f_i_l_e_2 is used most often when at least _f_i_l_e_1 contains data/function declarations/initializations. Having bc continue with prerequisite files missing is probably not useful. There is no implication in the Consequences of Errors subclause that bc must check all its files for accessibility before opening any of them. There was considerable debate on the appropriateness of the language accepted by bc. Several members of the balloting group preferred to see either a pure subset of the C language or some changes to make the language more compatible with C. While the bc language has some obvious similarities to C, it has never claimed to be compatible with any version of C. An interpreter for a subset of C might be a very worthwhile utility, and it could potentially make bc obsolete. However, no such utility is known in existing practice, and it was not within the scope of POSIX.2 to define such a language and utility. If and when they are defined, it may be appropriate to include them in a future revision of this standard. This left the following alternatives: (1) Exclude any calculator language from the standard. The consensus of the working group was that a simple programmatic calculator language is very useful. Also, an interactive version of such a calculator would be very important for the POSIX.2a revision. The only arguments for excluding any calculator were that it would become obsolete if and when a C- compatible one emerged, or that the absence would encourage the development of such a C-compatible one. These arguments did not sufficiently address the needs of current application writers. (2) Standardize the existing dc, possibly with minor modifications. The consensus of the working group was that dc is a fundamentally less usable language and that that would be far too severe a penalty for avoiding the issue of being similar to but incompatible with C. (3) Standardize the existing bc, possibly with minor modifications. This was the approach taken. Most of the proponents of changing the language would not have been satisfied until most or all of the incompatibilities with C were resolved. Since most of the changes considered most desirable would break existing applications and require significant modification to existing implementations, almost no modifications were made. The one significant modification that was made was the replacement of the traditional bc's assignment operators =+ et al. with the more modern += et al. The older versions are considered to be fundamentally flawed because of the lexical ambiguity in uses Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 380 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 like a=-1 In order to permit implementations to deal with backward compatibility as they see fit, the behavior of this one ambiguous construct was made undefined. (At least three implementations have been known to support this change already, so the degree of change involved should not be great.) The % operator is the mathematical remainder operator when scale is zero. The behavior of this operator for other values of scale is from traditional implementations of bc, and has been maintained for the sake of existing applications despite its nonintuitive nature. The bc utility always uses the period (.) character to represent a radix point, regardless of any decimal-point character specified as part of the current locale. In languages like C or awk, the period character is used in program source, so it can be portable and unambiguous, while the locale-specific character is used in input and output. Because there is no distinction between source and input in bc, this arrangement would not be possible. Using the locale-specific character in bc's input would introduce ambiguities into the language; consider the following example in a locale with a comma as the decimal-point character: define f(a,b) { ... } ... f(1,2,3) Because of such ambiguities, the period character is used in input. Having input follow different conventions from output would be confusing in either pipeline usage or interactive usage, so period is also used in output. Traditional implementations permit setting ibase and obase to a broader range of values. This includes values less than 2, which were not seen as sufficiently useful to standardize. These implementations do not interpret input properly for values of ibase outside greater than 16. This is because numeric constants are recognized syntactically, rather than lexically, as described in the standard. They are built from lexical tokens of single hexadecimal digits and periods. Since s between tokens are not visible at the syntactic level, it is not possible to properly recognize the multidigit ``digits'' used in the higher bases. The ability to recognize input in these bases was not considered useful enough to require modifying these implementations. Note that the recognition of numeric constants at the syntactic level is not a problem Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.3 bc - Arbitrary-precision arithmetic language 381 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX with conformance to the standard, as it does not impact the behavior of portable applications (and correct bc programs). Traditional implementations also accept input with all of the digits 0-9 and A-F regardless of the value of ibase; since digits with value greater than or equal to ibase are not really appropriate, the behavior when they appear is undefined, except for the common case of ibase=8; /* Process in octal base */ ... ibase=A /* Restore decimal base */ In some historical implementations, if the expression to be written is an uninitialized array element, a leading character and/or up to four leading 0 characters may be output before the character zero. This behavior is considered a bug; it is unlikely that any currently portable application relies on echo 'b[3]' | bc returning 00000 rather than 0. Exact calculation of the number of fractional digits to output for a given value in a base other than 10 can be computationally expensive. Traditional implementations use a faster approximation, and this is permitted. Note that the requirements apply only to values of obase that the standard requires implementations to support (in particular, not to 1, 0, or negative bases, if an implementation supports them as an extension). END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 382 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.4 cat - Concatenate and print files 4.4.1 Synopsis cat [-u] [_f_i_l_e ...] 4.4.2 Description The cat utility reads files in sequence and writes their contents to the standard output in the same sequence. 4.4.3 Options The cat utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -u Write bytes from the input file to the standard output without delay as each is read. 4.4.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If no _f_i_l_e operands are specified, the standard input is used. If a _f_i_l_e is -, the cat utility shall read from the standard input at that point in the sequence. The cat utility shall not close and reopen standard input when it is referenced in this way, but shall accept multiple occurrences of - as a _f_i_l_e operand. 4.4.5 External Influences 4.4.5.1 Standard Input The standard input is used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.4 cat - Concatenate and print files 383 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.4.5.2 Input Files The input files can be any file type. 4.4.5.3 Environment Variables The following environment variables shall affect the execution of cat: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.4.5.4 Asynchronous Events Default. 4.4.6 External Effects 4.4.6.1 Standard Output The standard output shall contain the sequence of bytes read from the input file(s). Nothing else shall be written to the standard output. 4.4.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 384 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.4.6.3 Output Files None. 4.4.7 Extended Description None. 4.4.8 Exit Status The cat utility shall exit with one of the following values: 0 All input files were output successfully. >0 An error occurred. 4.4.9 Consequences of Errors Default. BEGIN_RATIONALE 4.4.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Historical versions of the cat utility include the options -e, -t, and -v, which permit the ends of lines, s, and invisible characters, respectively, to be rendered visible in the output. The working group omitted these options because they provide too fine a degree of control over what is made visible, and similar output can be obtained using a command such as: sed -n -e 's/$/$/' -e l pathname The -s option was omitted because it corresponds to different functions in BSD and System V-based systems. The BSD -s option to squeeze blank lines will be handled by more -s in the UPE. The System V -s option to silence error messages can be accomplished by redirecting the standard error. An alternative to cat-s is the following shell script using sed: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.4 cat - Concatenate and print files 385 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX sed -n ' # Write non-empty lines. /./ { p d } # Write a single empty line, then look for more empty lines. /^$/ p # Get next line, discard the held (empty line), # and look for more empty lines. :Empty /^$/ { N s/.// b Empty } # Write the non-empty line before going back to search # for the first in a set of empty lines. p ' Note that the BSD documentation for cat uses the term ``blank line'' to mean the same as the POSIX ``empty line''; a line consisting only of a . The BSD -n option is omitted because similar functionality can be obtained from the -n option of the pr utility. The -u option is included here for its value in prototyping nonblocking reads from FIFOs. The intent is to support the following sequence: mkfifo foo cat -u foo > /dev/tty13 & cat -u > foo It is unspecified whether standard output is or is not buffered in the default case. This is sometimes of interest when standard output is associated with a terminal, since buffering may delay the output. The presence of the -u option guarantees that unbuffered I/O is available. It is implementation dependent whether the cat utility buffers output if the -u option is not specified. Traditionally, the -u option is implemented using the BSD _s_e_t_b_u_f_f_e_r() function, the System V _s_e_t_b_u_f() function, or the C Standard {7} _s_e_t_v_b_u_f() function. The following command cat myfile writes the contents of the file myfile to standard output. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 386 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The following command cat doc1 doc2 > doc.all concatenates the files doc1 and doc2 and writes the result to doc.all. Because of the shell language mechanism used to perform output redirection, a command such as this: cat doc doc.end > doc causes the original data in doc to be lost. Due to changes made to subclause 2.11.4 in Draft 11, the description of the _f_i_l_e operand now states that - must be accepted multiple times, as in historical practice. This allows the command: cat start - middle - end > file when standard input is a terminal, to get two arbitrary pieces of input from the terminal with a single invocation of cat. Note, however, that if standard input is a regular file, this would be equivalent to the command: cat start - middle /dev/null end > file because the entire contents of the file would be consumed by cat the first time - was used as a _f_i_l_e operand and an end-of-file condition would be detected immediately when - was referenced the second time. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e None. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.4 cat - Concatenate and print files 387 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.5 cd - Change working directory 4.5.1 Synopsis cd [_d_i_r_e_c_t_o_r_y] 4.5.2 Description The cd utility shall change the working directory of the current shell execution environment; see 3.12. When invoked with no operands, and the HOME environment variable is set to a nonempty value, the directory named in the HOME environment variable shall become the new working directory. If HOME is empty or is undefined, the default behavior is implementation defined. 4.5.3 Options None. 4.5.4 Operands The following operands shall be supported by the implementation: _d_i_r_e_c_t_o_r_y An absolute or relative pathname of the directory that becomes the new working directory. The interpretation of a relative pathname by cd depends on the CDPATH environment variable. If _d_i_r_e_c_t_o_r_y is -, the results are implementation defined. 4.5.5 External Influences 4.5.5.1 Standard Input None. 4.5.5.2 Input Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 388 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.5.5.3 Environment Variables The following environment variables shall affect the execution of cd: CDPATH A colon-separated list of pathnames that refer to directories. If the _d_i_r_e_c_t_o_r_y operand does not begin with a slash (/) character, and the first component is not dot or dot-dot, cd shall search for _d_i_r_e_c_t_o_r_y relative to each directory named in the CDPATH variable, in the order listed. The new working directory shall be set to the first matching directory found. An empty string in place of a directory pathname represents the current directory. If CDPATH is not set, it shall be treated as if it were an empty string. HOME The name of the home directory, used when no _d_i_r_e_c_t_o_r_y operand is specified. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.5.5.4 Asynchronous Events Default. 4.5.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.5 cd - Change working directory 389 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.5.6.1 Standard Output If a nonempty directory name from CDPATH is used, an absolute pathname of the new working directory shall be written to the standard output as follows: "%s\n", <_n_e_w _d_i_r_e_c_t_o_r_y> Otherwise, there shall be no output. 4.5.6.2 Standard Error Used only for diagnostic messages. 4.5.6.3 Output Files None. 4.5.7 Extended Description None. 4.5.8 Exit Status The cd utility shall exit with one of the following values: 0 The directory was successfully changed. >0 An error occurred. 4.5.9 Consequences of Errors The working directory remains unchanged. BEGIN_RATIONALE 4.5.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e _E_d_i_t_o_r'_s _N_o_t_e: _A _b_a_l_l_o_t_e_r _r_e_q_u_e_s_t_e_d _t_h_a_t _t_h_e _f_o_l_l_o_w_i_n_g _r_a_t_i_o_n_a_l_e _b_e 2 _h_i_g_h_l_i_g_h_t_e_d _i_n _t_h_e _D_1_1._2 _r_e_c_i_r_c_u_l_a_t_i_o_n. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 390 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Since cd affects the current shell execution environment, it is generally provided as a shell regular built-in. If it is called in a subshell or 1 separate utility execution environment, such as one of the following: 1 (cd /tmp) 1 nohup cd 1 find . -exec cd {} \; 1 it will not affect the working directory of the caller's environment. 1 The use of the CDPATH was introduced in the System V shell. Its use is analogous to the use of the PATH variable in the shell. Earlier systems such as the BSD C-shell used a shell parameter cdpath for this purpose. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e A common extension when HOME is undefined is to get the login directory from the user database for the invoking user. This does not occur on System V implementations. Not included in this description are the features from the KornShell such as setting OLDPWD, toggling current and previous directory (cd -), and the two-operand form of cd (cd _o_l_d _n_e_w). This standard does not specify the results of cd - or of calls with more than one operand. Since these extensions are mostly used in interactive situations, they may be considered for inclusion in POSIX.2a. The result of cd - and of using no arguments with HOME unset or null have been made implementation defined at the request of the POSIX.6 security working group. The setting of the PWD variable was removed from earlier drafts, as it can be replaced by $(pwd). END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.5 cd - Change working directory 391 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.6 chgrp - Change file group ownership 4.6.1 Synopsis chgrp [-R] _g_r_o_u_p _f_i_l_e ... 4.6.2 Description The chgrp utility shall set the group ID of the file named by each _f_i_l_e operand to the group ID specified by the _g_r_o_u_p operand. For each _f_i_l_e operand, it shall perform actions equivalent to the POSIX.1 {8} _c_h_o_w_n() function, called with the following arguments: (1) The _f_i_l_e operand shall be used as the _p_a_t_h argument. (2) The user ID of the file shall be used as the _o_w_n_e_r argument. (3) The specified _g_r_o_u_p _I_D shall be used as the _g_r_o_u_p argument. 4.6.3 Options The chgrp utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -R Recursively change file group IDs. For each _f_i_l_e operand that names a directory, chgrp shall change the group of the directory and all files in the file hierarchy below it. 4.6.4 Operands The following operands shall be supported by the implementation: _g_r_o_u_p A group name from the group database or a numeric group ID. Either specifies a group ID to be given to each file named by one of the _f_i_l_e operands. If a numeric _g_r_o_u_p operand exists in the group database as a group name, the group ID number associated with that group name is used as the group ID. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 392 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _f_i_l_e A pathname of a file whose group ID is to be modified. 4.6.5 External Influences 4.6.5.1 Standard Input None. 4.6.5.2 Input Files None. 4.6.5.3 Environment Variables The following environment variables shall affect the execution of chgrp: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.6.5.4 Asynchronous Events Default. 4.6.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.6 chgrp - Change file group ownership 393 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.6.6.1 Standard Output None. 4.6.6.2 Standard Error Used only for diagnostic messages. 4.6.6.3 Output Files None. 4.6.7 Extended Description None. 4.6.8 Exit Status The chgrp utility shall exit with one of the following values: 0 The utility executed successfully and all requested changes were made. >0 An error occurred. 4.6.9 Consequences of Errors If, when invoked with the -R option, chgrp attempts but fails to change the group ID of a particular file in a specified file hierarchy, it shall continue to process the remaining files in the hierarchy. If chgrp cannot read or search a directory within a hierarchy, it shall continue to process the other parts of the hierarchy that are accessible. BEGIN_RATIONALE 4.6.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The System V and BSD versions use different exit status codes. Some implementations used the exit status as a count of the number of errors that occurred; this practice is unworkable since it can overflow the range of valid exit status value. The working group chose to mask these by specifying only 0 and >0 as exit values. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 394 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The functionality of chgrp is described substantially through references to functions in POSIX.1 {8}. In this way, there is no duplication of effort required for describing the interactions of permissions, multiple groups, etc. END_RATIONALE 4.7 chmod - Change file modes 4.7.1 Synopsis chmod [-R] _m_o_d_e _f_i_l_e ... 4.7.2 Description The chmod utility shall change any or all of the file mode bits of the file named by each _f_i_l_e operand in the way specified by the _m_o_d_e operand. It is implementation defined whether and how the chmod utility affects any alternate or additional file access control mechanism (see _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55) being used for the specified file. Only a process whose effective user ID matches the user ID of the file, or a process with the appropriate privileges, shall be permitted to change the file mode bits of a file. 4.7.3 Options The chmod utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -R Recursively change file mode bits. For each _f_i_l_e operand that names a directory, chmod shall change the file mode bits of the directory and all files in the file hierarchy below it. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.7 chmod - Change file modes 395 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.7.4 Operands The following operands shall be supported by the implementation: _m_o_d_e Represents the change to be made to the file mode bits of each file named by one of the _f_i_l_e operands, as described in 4.7.7. _f_i_l_e A pathname of a file whose file mode bits are to be modified. 4.7.5 External Influences 4.7.5.1 Standard Input None. 4.7.5.2 Input Files None. 4.7.5.3 Environment Variables The following environment variables shall affect the execution of chmod: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 396 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.7.5.4 Asynchronous Events Default. 4.7.6 External Effects 4.7.6.1 Standard Output None. 4.7.6.2 Standard Error Used only for diagnostic messages. 4.7.6.3 Output Files None. 4.7.7 Extended Description The _m_o_d_e operand shall be either a symbolic_mode expression or a nonnegative octal integer. The symbolic_mode form is described by the grammar in 4.7.7.1. Each clause shall specify an operation to be performed on the current file mode bits of each _f_i_l_e. The operations shall be performed on each _f_i_l_e in the order in which the clauses are specified. The _w_h_o symbols u, g, and o shall specify the _u_s_e_r, _g_r_o_u_p, and _o_t_h_e_r parts of the file mode bits, respectively. A _w_h_o consisting of the symbol a shall be equivalent to ugo. The _p_e_r_m symbols r, w, and x represent the _r_e_a_d, _w_r_i_t_e, and _e_x_e_c_u_t_e/_s_e_a_r_c_h portions of file mode bits, respectively. The _p_e_r_m symbol s shall represent the _s_e_t-_u_s_e_r-_I_D-_o_n-_e_x_e_c_u_t_i_o_n (when who contains or implies u) and _s_e_t-_g_r_o_u_p-_I_D-_o_n-_e_x_e_c_u_t_i_o_n (when who contains or implies g) bits. The perm symbol X shall represent the execute/search portion of the file mode bits if the file is a directory or if the current (unmodified) file mode bits have at least one of the execute bits (S_IXUSR, S_IXGRP, or S_IXOTH) set. It shall be ignored if the file is not a directory and none of the execute bits are set in the current file mode bits. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.7 chmod - Change file modes 397 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The permcopy symbols u, g, and o shall represent the current permissions associated with the user, group, and other parts of the file mode bits, respectively. For the remainder of subclause 4.7.7 up to subclause 4.7.7.1, perm refers to the nonterminals perm and permcopy in the grammar in 4.7.7.1. If multiple actionlist_s are grouped with a single wholist in the grammar, each actionlist shall be applied in the order specified with that wholist. The op symbols shall represent the operation performed, as follows: + If perm is not specified, the + operation shall not change the file mode bits. If who is not specified, the file mode bits represented by perm for the owner, group, and other permissions, except for those with corresponding bits in the file mode creation mask of the invoking process, shall be set. Otherwise, the file mode bits represented by the specified who and perm values shall be set. - If perm is not specified, the - operation shall not change the file mode bits. If who is not specified, the file mode bits represented by perm for the owner, group, and other permissions, except for those with corresponding bits in the file mode creation mask of the invoking process, shall be cleared. Otherwise, the file mode bits represented by the specified who and perm values shall be cleared. = Clear the file mode bits specified by the who value, or, if no who value is specified, all of the file mode bits specified in this standard. If perm is not specified, the = operation shall make no further modifications to the file mode bits. If who is not specified, the file mode bits represented by perm for the owner, group, and other permissions, except for those with corresponding bits in the file mode creation mask of the invoking process, shall be set. Otherwise, the file mode bits represented by the specified who and perm values shall be set. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 398 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 When using the symbolic mode form on a regular file, it is implementation defined whether or not: (1) Requests to set the set-user-ID-on-execution or set-group-ID- on-execution bit when all execute bits are currently clear and none are being set are ignored, (2) Requests to clear all execute bits also clear the set-user-ID- on-execution and set-group-ID-on-execution bits, or (3) Requests to clear the set-user-ID-on-execution or set-group-ID- on-execution bits when all execute bits are currently clear are ignored. However, if the command ls -l file (see 4.39.6.1) writes an s in the positions indicating that the set-user-ID- on-execution or set-group-ID-on-execution, the commands chmod u-s file or chmod g-s file, respectively, shall not be ignored. When using the symbolic mode form on other file types, it is 2 implementation defined whether or not requests to set or clear the set- 2 user-ID-on-execution or set-group-ID-on-execution bits are honored. 2 If the who symbol o is used in conjunction with the perm symbol s with no other who symbols being specified, the set-user-ID-on-execution and set- group-ID-on-execution bits shall not be modified. It shall not be an error to specify the who symbol o in conjunction with the perm symbol s. For an octal integer _m_o_d_e operand, the file mode bits shall be set absolutely. The octal number form of the _m_o_d_e operand is obsolescent. For each bit set in the octal number, the corresponding file permission 2 bit shown in the following table shall be set; all other file permission 2 bits shall be cleared. For regular files, for each bit set in the octal 2 number corresponding to the set-user-ID-on-execution or the set-group- 2 ID-on-execution bits shown in the following table shall be set; if these 2 bits are not set in the octal number, they shall be cleared. For other 2 file types, it is implementation defined whether or not requests to set 2 or clear the set-user-ID-on-execution or set-group-ID-on-execution bits 2 are honored. 2 _______________________________________________________________________ _|O_c_t_a_l___M_o_d_e__b_i_t___|_O_c_t_a_l___M_o_d_e__b_i_t___|_O_c_t_a_l___M_o_d_e__b_i_t___|_O_c_t_a_l___M_o_d_e__b_i_t__| |4000 S_ISUID | 0400 S_IRUSR | 0040 S_IRGRP | 0004 S_IROTH | _|_________________|__________________|__________________|_________________| _|2_0_0_0____S___I_S_G_I_D____|_0_2_0_0____S___I_W_U_S_R____|_0_0_2_0____S___I_W_G_R_P____|_0_0_0_2____S___I_W_O_T_H___| | | 0100 S_IXUSR | 0010 S_IXGRP | 0001 S_IXOTH | _|_________________|__________________|__________________|_________________| When bits are set in the octal number other than those listed in the table above, the behavior is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.7 chmod - Change file modes 399 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.7.7.1 chmod Grammar The grammar and lexical conventions in this subclause describe the syntax for the symbolic_mode operand. The general conventions for this style of grammar are described in 2.1.2. A valid symbolic_mode can be represented as the nonterminal symbol symbolic_mode in the grammar. Any discrepancies found between this grammar and descriptions in the rest of this clause shall be resolved in favor of this grammar. The lexical processing shall be based entirely on single characters. Implementations need not allow s within the single argument being processed. %start symbolic_mode %% symbolic_mode : clause | symbolic_mode ',' clause ; clause : actionlist | wholist actionlist ; wholist : who | wholist who ; who : 'u' | 'g' | 'o' | 'a' ; actionlist : action | actionlist action ; action : op | op permlist | op permcopy ; permcopy : 'u' | 'g' | 'o' ; Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 400 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 op : '+' | '-' | '=' ; permlist : perm | perm permlist ; perm : 'r' | 'w' | 'x' | 'X' | 's' ; 4.7.8 Exit Status The chmod utility shall exit with one of the following values: 0 The utility executed successfully and all requested changes were made. >0 An error occurred. 4.7.9 Consequences of Errors If, when invoked with the -R option, chmod attempts but fails to change the mode of a particular file in a specified file hierarchy, it shall continue to process the remaining files in the hierarchy, affecting the final exit status. If chmod cannot read or search a directory within a hierarchy, it shall continue to process the other parts of the hierarchy that are accessible. BEGIN_RATIONALE 4.7.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The functionality of chmod is described substantially through references to concepts defined in POSIX.1 {8}. In this way, there is less duplication of effort required for describing the interactions of permissions, etc. However, the behavior of this utility is not described in terms of the _c_h_m_o_d() function from POSIX.1 {8}, because that specification requires certain side effects upon alternate file access Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.7 chmod - Change file modes 401 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX control mechanisms that might not be appropriate, depending on the implementation. Some historical implementations of the chmod utility change the mode of a directory before the files in the directory when performing a recursive (-R option) change; others change the directory mode after the files in the directory. If an application tries to remove read or search permission for a file hierarchy, the removal attempt will fail if the directory is changed first; on the other hand, trying to re-enable permissions to a restricted hierarchy will fail if directories are changed last. Since neither method is clearly better and users do not frequently try to make a hierarchy inaccessible to themselves, the standard does not specify what happens in this case. Note that although the association shown in the table between bits in the octal number and the indicated file mode bits must be supported, this does not require that a conforming implementation has to actually use those octal values to implement the macros shown. Historical System V implementations of chmod never use the process's _u_m_a_s_k when changing modes. Version 7 and historical BSD systems do use the mask when who is not specified, as described in this standard. Applications should note the difference between: chmod a-w file which removes all write permissions, and: chmod -- -w file which removes write permissions that would be allowed if file was created with the same _u_m_a_s_k. Note that _m_o_d_e operands -r, -w, -s, -x, or -X, or anything beginning with a hyphen, must be preceded by -- to keep it from being interpreted as an option. It is difficult to express the grammar used by chmod in English, but the following examples have been accepted by historical System V and BSD systems and are, therefore, required to behave this way by POSIX.2 even though some of them could be expressed more succinctly: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 402 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Mode Results _____ __________________________________________ a+= Equivalent to a+,a=; clears all file mode bits. go+-w Equivalent to go+,go-w; clears group and other write bits. g=o-w Equivalent to g=o,g-w; sets group bit to match other bits and then clears group write bit. g-r+w Equivalent to g-r,g+w; clears group read bit and sets group write bit. =g Sets owner bits to match group bits and sets other bits to match group bits. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Implementations that support mandatory file and record locking as specified by the /_u_s_r/_g_r_o_u_p _S_t_a_n_d_a_r_d {B29} historically used the combination of set-group-ID bit set and group execute bit clear to indicate mandatory locking. This condition is usually set or cleared with the symbolic mode perm symbol l instead of the perm symbols s and x so that mandatory locking mode is not changed without explicit indication that that was what the user intended. Therefore, the details on how the implementation treats these conditions must be defined in the documentation. This standard does not require mandatory locking (nor does POSIX.1 {8}), but does allow it as an extension. However, POSIX.2 does require that the ls and chmod utilities work consistently in this area. If ls -l file says the set-group-ID bit is set, chmod g-s file must clear it (assuming appropriate privileges exist to change modes). The System V and BSD versions use different exit status codes. Some implementations used the exit status as a count of the number of errors that occurred; this practice is unworkable since it can overflow the range of valid exit status values. This problem is avoided here by specifying only 0 and >0 as exit values. A ``sticky'' file mode bit, indicating that the text portion of an executable object program file should be saved after the program is gone, has meaning in some implementations, but was omitted here because its purpose is implementation dependent and because it was omitted from POSIX.1 {8}. On 4.3BSD-based implementations, the sticky bit is used in conjunction with directory permissions to keep anyone from deleting a file that they do not own from the directory. The perm symbol t is used to represent the sticky bit in many existing implementations and should not be used for other conflicting extensions. POSIX.1 {8} indicates that implementation-defined restrictions may cause the S_ISUID and S_ISGID bits to be ignored. POSIX.2 allows the chmod utility to choose to modify these bits before calling POSIX.1 {8} _c_h_m_o_d() (or some function providing equivalent capabilities) for nonregular Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.7 chmod - Change file modes 403 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX files. Among other things, this allows implementations that use the set-user-ID and set-group-ID bits on directories to enable extended features to handle these extensions in an intelligent manner. Portable applications should never assume that they know how these bits will be interpreted, except on regular files. The grammar in Draft 9 did not allow several symbolic mode operands that are correctly processed by historical implementations. (It only allowed two clauses and one op per clause.) The grammar presented in Draft 10 matches historical implementations. The X perm symbol was added, as provided in BSD-based systems, because it provides commonly desired functionality when doing recursive (-R option) modifications. Similar functionality is not provided by the find utility. Historical BSD versions of chmod, however, only supported X with op +; it has been extended here because it is also useful with op =. (It has also been added for op - even though it duplicates x, in this case, because it is intuitive and easier to explain.) The grammar was extended with the permcopy nonterminal to allow existing-practice forms of symbolic modes like o=u-g (i.e., set the ``other'' permissions to the permissions of ``owner'' minus the permissions of ``group''.) END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 404 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.8 chown - Change file ownership 4.8.1 Synopsis chown [-R] _o_w_n_e_r[:_g_r_o_u_p] _f_i_l_e ... 4.8.2 Description The chown utility shall set the user ID of the file named by each _f_i_l_e operand to the user ID specified by the _o_w_n_e_r operand. For each _f_i_l_e operand, it shall perform actions equivalent to the POSIX.1 {8} _c_h_o_w_n() function, called with the following arguments: (1) The _f_i_l_e operand shall be used as the _p_a_t_h argument. (2) The user ID indicated by the _o_w_n_e_r portion of the first operand shall be used as the _o_w_n_e_r argument. (3) If the _g_r_o_u_p portion of the first operand is given, the group ID indicated by it shall be used as the _g_r_o_u_p argument; otherwise, the group ID of the file shall be used as the _g_r_o_u_p argument. 4.8.3 Options The chown utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -R Recursively change file user IDs, and if the _g_r_o_u_p operand is specified, group IDs. For each _f_i_l_e operand that names a directory, chown changes the user and group ID of the directory and all files in the file hierarchy below it. 4.8.4 Operands The following operands shall be supported by the implementation: _o_w_n_e_r[:_g_r_o_u_p] A user ID and optional group ID to be assigned to file. The _o_w_n_e_r portion of this operand shall be a user name from the user database or a numeric user ID. Either specifies a user ID to be given to each file named by one of the _f_i_l_e operands. If a numeric _o_w_n_e_r operand exists Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.8 chown - Change file ownership 405 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX in the user database as a user name, the user ID number associated with that user name is used as the user ID. Similarly, if the _g_r_o_u_p portion of this operand is present, it shall be a group name from the group database or a numeric group ID. Either specifies a group ID to be given to each file. If a numeric group operand exists in the group database as a group name, the group ID number associated with that group name shall be used as the group ID. _f_i_l_e A pathname of a file whose user ID is to be modified. 4.8.5 External Influences 4.8.5.1 Standard Input None. 4.8.5.2 Input Files None. 4.8.5.3 Environment Variables The following environment variables shall affect the execution of chown: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 406 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.8.5.4 Asynchronous Events Default. 4.8.6 External Effects 4.8.6.1 Standard Output None. 4.8.6.2 Standard Error Used only for diagnostic messages. 4.8.6.3 Output Files None. 4.8.7 Extended Description None. 4.8.8 Exit Status The chown utility shall exit with one of the following values: 0 The utility executed successfully and all requested changes were made. >0 An error occurred. 4.8.9 Consequences of Errors If, when invoked with the -R option, chown attempts but fails to change the user ID and/or, if the _g_r_o_u_p operand is specified, group ID, of a particular file in a specified file hierarchy, it shall continue to process the remaining files in the hierarchy. If chown cannot read or search a directory within a hierarchy, it shall continue to process the other parts of the hierarchy that are accessible. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.8 chown - Change file ownership 407 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.8.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The System V and BSD versions use different exit status codes. Some implementations used the exit status as a count of the number of errors that occurred; this practice is unworkable since it can overflow the range of valid exit status values. These are masked by specifying only 0 and >0 as exit values. The functionality of chown is described substantially through references to functions in POSIX.1 {8}. In this way, there is no duplication of effort required for describing the interactions of permissions, multiple groups, etc. For implementations on which symbolic links are supported, actual use of the _c_h_o_w_n() function to implement this utility might not be the appropriate, depending on the implementation. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The 4.3BSD method of specifying both owner and group was included in this standard because: (1) There are cases where the desired end condition could not be achieved using the chgrp and chown (that only changed the user ID) utilities. [If the current owner is not a member of the desired group and the desired owner is not a member of the current group, the _c_h_o_w_n() function could fail unless both owner and group are changed at the same time.] (2) Even if they could be changed independently, in cases where both are being changed, there is a 100 percent performance penalty caused by being forced to invoke both utilities. The BSD syntax _u_s_e_r[._g_r_o_u_p] was changed to _u_s_e_r[:_g_r_o_u_p] in POSIX.2 because the period is a valid character in login names (as specified by POSIX.1 {8}, login names consist of characters in the portable filename character set). The colon character was chosen as the replacement for the period character because it would never be allowed as a character in a user name or group name on traditional implementations. The -R option is considered by some observers as an undesirable departure from the traditional UNIX system tools approach; since a tool, find, already exists to recurse over directories, there was felt to be no good reason to require other tools to have to duplicate that functionality. However, the -R option was deemed an important user convenience, is far more efficient than forking a separate process for each element of the directory hierarchy, and is in widespread historical use. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 408 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.9 cksum - Write file checksums and sizes 2 4.9.1 Synopsis cksum [_f_i_l_e ...] 4.9.2 Description The cksum utility shall calculate and write to standard output a cyclic 2 redundancy check (CRC) for each input file, and also write to standard 2 output the number of octets in each file. The CRC used is based on the 2 polynomial used for CRC error checking in the networking standard ISO 8802-3 {B7}. The CRC checksum shall be obtained in the following way: The encoding is defined by the generating polynomial: _G(_x) = _x32 + _x26 + _x23 + _x22 + _x16 + _x12 + _x11 + _x10 + _x8 + _x7 + _x5 + _x4 + _x2 + _x + 1 Mathematically, the CRC value corresponding to a given file shall be defined by the following procedure: (1) The _n bits to be evaluated are considered to be the coefficients 2 of a mod 2 polynomial _M(_x) of degree _n-1. These _n bits are the 2 bits from the file, with the most significant bit being the most 2 significant bit of the first octet of the file and the last bit 2 being the least significant bit of the last octet, padded with 2 zero bits (if necessary) to achieve an integral number of 2 octets, followed by one or more octets representing the length 2 of the file as a binary value, least significant octet first. 2 The smallest number of octets capable of representing this 2 integer shall be used. 2 (2) _M(_x) is multiplied by _x32 (i.e., shifted left 32 bits) and divided by _G(_x) using mod 2 division, producing a remainder _R(_x) of degree _< 31. 2 (3) The coefficients of _R(_x) are considered to be a 32-bit sequence. (4) The bit sequence is complemented and the result is the CRC. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.9 cksum - Write file checksums and sizes 409 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.9.3 Options None. 4.9.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of a file to be checked. If no _f_i_l_e operands are specified, the standard input is used. 4.9.5 External Influences 4.9.5.1 Standard Input The standard input is used only if no _f_i_l_e operands are specified. See Input Files. 4.9.5.2 Input Files The input files can be any file type. 4.9.5.3 Environment Variables The following environment variables shall affect the execution of cksum: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 410 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.9.5.4 Asynchronous Events Default. 4.9.6 External Effects 4.9.6.1 Standard Output For each file processed successfully, the cksum utility shall write in 2 the following format: "%u %d %s\n", <_c_h_e_c_k_s_u_m>, <# _o_f _o_c_t_e_t_s>, <_p_a_t_h_n_a_m_e> 2 If no _f_i_l_e operand was specified, the pathname and its leading space shall be omitted. 4.9.6.2 Standard Error Used only for diagnostic messages. 4.9.6.3 Output Files None. 4.9.7 Extended Description None. 4.9.8 Exit Status The cksum utility shall exit with one of the following values: 0 All files were processed successfully. >0 An error occurred. 4.9.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.9 cksum - Write file checksums and sizes 411 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.9.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The cksum utility is typically used to quickly compare a suspect file against a trusted version of the same. However, no claims are made by POSIX.2 that this comparison is cryptographically secure; the historical sum utility from which cksum was inspired has traditionally been used mainly to ensure that files transmitted over noisy media arrive intact. The chances of a damaged file producing the same CRC as the original are astronomically small; deliberate deception is difficult, but probably not impossible. Although input files to cksum can be any type, the results need not be what would be expected on character special device files or on file types not described by POSIX.1 {8}. Since POSIX.2 does not specify the block size used when doing input, checksums of character special files need not process all of the data in those files. The algorithm is expressed in terms of a bitstream divided into octets. 2 If a file is transmitted between two systems and undergoes any data 2 transformation (such as moving 8-bit characters into 9-bit bytes or 2 changing ``little Endian'' byte ordering to ``big Endian''), identical 2 CRC values cannot be expected. Implementations performing such 2 transformations may extend cksum to handle such situations. 2 The following C-language program can be used as a model to describe the algorithm. It assumes that a char is one octet. It also assumes that 2 the entire file is available for one pass through the function. This was 2 done for simplicity in demonstrating the algorithm, rather than as an 2 implementation model. 2 static unsigned long crctab[] = { 2 0x0, 2 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 412 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d }; unsigned long memcrc(const unsigned char *b, size_t n) 2 { 1 /* Input arguments: 1 * const char* b == byte sequence to checksum 1 * size_t n == length of sequence 1 */ 1 register unsigned int i, c, s = 0; 2 for (i = n; i > 0; --i) { 2 c = (unsigned int)(*b++); 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.9 cksum - Write file checksums and sizes 413 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX s = (s << 8) ^ crctab[(s >> 24) ^ c]; 2 } 2 /* extend with the length of the string */ 2 while (n != 0) { 2 c = n & 0377; 2 n >>= 8; 2 s = (s << 8) ^ crctab[(s >> 24) ^ c]; 2 } 2 return s; 2 } ~ _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The historical practice of writing the number of ``blocks'' has been removed in favor of writing the number of octets since the latter is not 2 only more useful, but historical implementations have not been consistent in defining what a ``block'' meant. Octets are used instead of bytes because bytes can differ in size between systems. The algorithm used was selected to increase the robustness of the utility's operation. Neither the System V nor BSD sum algorithm was selected. Since each of these was different and each was the default behavior on those systems, no realistic compromise was available if either were selected--some set of historical applications would break. Therefore, the name was changed to cksum. Although the historical sum commands will probably continue to be provided for many years to come, programs designed for portability across systems should use the new name. The algorithm selected is based on that used by the Ethernet standard for the Frame Check Sequence Field. The algorithm used does not match the technical definition of a _c_h_e_c_k_s_u_m; the term is used for historical reasons. The length of the file is included in the CRC calculation 2 because this parallels Ethernet's inclusion of a length field in its CRC, 2 but also because it guards against inadvertent collisions between files 2 that begin with different series of zero octets. The chance that two 2 different files will produce identical CRCs is much greater when their 2 lengths are not considered. Keeping the length and the checksum of the 2 file itself separate would yield a slightly more robust algorithm, but 2 historical usage has always been that a single number (the checksum as 2 printed) represents the signature of the file. It was decided that 2 historical usage was the more important consideration. 2 Earlier drafts contained modifications to the Ethernet algorithm that 2 involved extracting table values whenever an intermediate result became 2 zero. This was demonstrated to be less robust than the current method 2 and mathematically difficult to describe or justify. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 414 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _f_o_l_l_o_w_i_n_g _b_i_b_l_i_o_g_r_a_p_h_i_c _r_e_f_e_r_e_n_c_e_s _w_i_l_l _b_e _c_l_e_a_n_e_d _u_p _b_e_f_o_r_e _t_h_e _s_t_a_n_d_a_r_d _i_s _c_o_m_p_l_e_t_e_d. The calculation used is identical to that given in pseudo-code on page 1011 of _C_o_m_m_u_n_i_c_a_t_i_o_n_s _o_f _t_h_e _A_C_M, August, 1988 in the article ``Computation of Cyclic Redundancy Checks Via Table Lookup'' by Dilip V. Sarwate. The pseudo-code rendition is: X <- 0; Y <- 0; for i <- m -1 step -1 until 0 do begin T <- X(1) ^ A[i]; 2 X(1) <- X(0); X(0) <- Y(1); Y(1) <- Y(0); Y(0) <- 0; comment: f[T] and f'[T] denote the T-th words in the table f and f' ; X <- X ^ f[T]; Y <- Y ^ f'[T]; end The pseudo-code is reproduced exactly as given; however, note that in cksum'_s case, A[i] represents a byte of the file, the words X and Y are a 2 treated as a single 32-bit value, and the tables f and f' are a single table containing 32-bit values. The article also discusses generating the table(s). Other sources consulted about CRC's: ``A Tutorial on CRC Computations,'' Ramabadran and Gaitonde, _I_E_E_E _M_i_c_r_o, p. 62, August 1988; _C_o_m_p_u_t_e_r _N_e_t_w_o_r_k_s, Andrew Tanenbaum, Prentice-Hall, Inc. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.9 cksum - Write file checksums and sizes 415 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.10 cmp - Compare two files 4.10.1 Synopsis cmp [ -l | -s ] _f_i_l_e_1 _f_i_l_e_2 4.10.2 Description The cmp utility shall compare two files. The cmp utility shall write no output if the files are the same. Under default options, if they differ, it shall write to standard output the byte and line number at which the first difference occurred. Bytes and lines shall be numbered beginning with 1. 4.10.3 Options The cmp utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -l (Lowercase ell.) Write the byte number (decimal) and the differing bytes (octal) for each difference. -s Write nothing for differing files; return exit status only. 4.10.4 Operands The following operands shall be supported by the implementation: _f_i_l_e_1 A pathname of the first file to be compared. If _f_i_l_e_1 is -, the standard input shall be used. _f_i_l_e_2 A pathname of the second file to be compared. If _f_i_l_e_2 is -, the standard input shall be used. If both _f_i_l_e_1 and _f_i_l_e_2 refer to standard input or refer to the same FIFO special, block special, or character special file, the results are undefined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 416 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.10.5 External Influences 4.10.5.1 Standard Input The standard input shall be used only if the _f_i_l_e_1 or _f_i_l_e_2 operand refers to standard input. See Input Files. 4.10.5.2 Input Files The input files can be any file type. 4.10.5.3 Environment Variables The following environment variables shall affect the execution of cmp: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.10.5.4 Asynchronous Events Default. 4.10.6 External Effects 4.10.6.1 Standard Output In the POSIX Locale, results of the comparison shall be written to standard output. When no options are used, the format shall be: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.10 cmp - Compare two files 417 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX "%s %s differ: char %d, line %d\n", _f_i_l_e_1, _f_i_l_e_2, <_b_y_t_e _n_u_m_b_e_r>, <_l_i_n_e _n_u_m_b_e_r> When the -l option is used, the format is: "%d %o %o\n", <_b_y_t_e _n_u_m_b_e_r>, <_d_i_f_f_e_r_i_n_g _b_y_t_e>, <_d_i_f_f_e_r_i_n_g _b_y_t_e> for each byte that differs. The first <_d_i_f_f_e_r_i_n_g _b_y_t_e> number is from _f_i_l_e_1 while the second is from _f_i_l_e_2. In both cases, <_b_y_t_e _n_u_m_b_e_r> shall 2 be relative to the beginning of the file, beginning with 1. 2 The <_a_d_d_i_t_i_o_n_a_l _i_n_f_o> field shall either be null or a string that starts 1 with a and contains no characters. 1 No output shall be written to standard output when the -s option is used. 4.10.6.2 Standard Error Used only for diagnostic messages. If _f_i_l_e_1 and _f_i_l_e_2 are identical for 2 the entire length of the shorter file, in the POSIX Locale the following 2 diagnostic message shall be written, unless the -s option is specified. 2 "cmp: EOF on %s%s\n", <_n_a_m_e _o_f _s_h_o_r_t_e_r _f_i_l_e>, <_a_d_d_i_t_i_o_n_a_l _i_n_f_o> 1 4.10.6.3 Output Files None. 4.10.7 Extended Description None. 4.10.8 Exit Status The cmp utility shall exit with one of the following values: 0 The files are identical. 1 The files are different; this includes the case where one file is identical to the first part of the other. >1 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 418 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.10.9 Consequences of Errors Default. BEGIN_RATIONALE 4.10.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The global language in Section 2 indicates that using two mutually- exclusive options together produces unspecified results. Some System V implementations consider the option usage: cmp -l -s ... to be an error. They also treat: cmp -s -l ... as if no options were specified. Both of these behaviors are considered bugs, but are allowed. Although input files to cmp can be any type, the results might not be what would be expected on character special device files or on file types not described by POSIX.1 {8}. Since POSIX.2 does not specify the block size used when doing input, comparisons of character special files need not compare all of the data in those files. The word char in the standard output format comes from historical usage, 1 even though it is actually a byte number. When cmp is supported in other 1 locales, implementations are encouraged to use the word byte or its 1 equivalent in another language. Users should not interpret this 1 difference to indicate that the functionality of the utility changed 1 between locales. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Some systems report on the number of lines in the identical-but-shorter 1 file case. This is allowed by the inclusion of the <_a_d_d_i_t_i_o_n_a_l _i_n_f_o> 1 fields in the output format. The restriction on having a leading 1 and no s is to make parsing for the file name easier. It is 1 recognized that some file names containing white-space characters will 1 make parsing difficult anyway, but the restriction does aid programs used 1 on systems where the names are predominantly well behaved. 1 END_RATIONALE 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.10 cmp - Compare two files 419 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.11 comm - Select or reject lines common to two files 4.11.1 Synopsis comm [-123] _f_i_l_e_1 _f_i_l_e_2 4.11.2 Description The comm utility shall read _f_i_l_e_1 and _f_i_l_e_2, which should be ordered in the current collating sequence, and produce three text columns as output: lines only in _f_i_l_e_1; lines only in _f_i_l_e_2; and lines in both files. If the lines in both files are not ordered according to the collating sequence of the current locale, the results are unspecified. 4.11.3 Options The comm utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -1 Suppress the output column of lines unique to _f_i_l_e_1. 1 -2 Suppress the output column of lines unique to _f_i_l_e_2. 1 -3 Suppress the output column of lines duplicated in _f_i_l_e_1 1 and _f_i_l_e_2. 1 4.11.4 Operands The following operands shall be supported by the implementation: _f_i_l_e_1 A pathname of the first file to be compared. If _f_i_l_e_1 is -, the standard input is used. _f_i_l_e_2 A pathname of the second file to be compared. If _f_i_l_e_2 is -, the standard input is used. If both _f_i_l_e_1 and _f_i_l_e_2 refer to standard input or to the same FIFO special, block special, or character special file, the results are undefined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 420 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.11.5 External Influences 4.11.5.1 Standard Input The standard input shall be used only if one of the _f_i_l_e_1 or _f_i_l_e_2 operands refers to standard input. See Input Files. 4.11.5.2 Input Files The input files shall be text files. 4.11.5.3 Environment Variables The following environment variables shall affect the execution of comm: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_COLLATE This variable shall determine the locale for the collating sequence comm expects to have been used when the input files were sorted. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.11.5.4 Asynchronous Events Default. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.11 comm - Select or reject lines common to two files 421 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.11.6 External Effects 4.11.6.1 Standard Output The comm utility shall produce output depending on the options selected. If the -1, -2, and -3 options are all selected, comm shall write nothing to standard output. If the -1 option is not selected, lines contained only in _f_i_l_e_1 shall be written using the format: "%s\n", <_l_i_n_e _i_n _f_i_l_e_1> If the -2 option is not selected, lines contained only in _f_i_l_e_2 shall be written using the format: "%s%s\n", <_l_e_a_d>, <_l_i_n_e _i_n _f_i_l_e_2> where the string <_l_e_a_d> is: if the -1 option is not selected, or null string if the -1 option is selected. If the -3 option is not selected, lines contained in both files shall be written using the format: "%s%s\n", <_l_e_a_d>, <_l_i_n_e _i_n _b_o_t_h> where the string <_l_e_a_d> is: if neither the -1 nor the -2 option is selected, or if exactly one of the -1 and -2 options is selected, or null string if both the -1 and -2 options are selected. If the input files were ordered according to the collating sequence of the current locale, the lines written shall be in the collating sequence of the original lines. 4.11.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 422 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.11.6.3 Output Files None. 4.11.7 Extended Description None. 4.11.8 Exit Status The comm utility shall exit with one of the following values: 0 All input files were successfully output as specified. >0 An error occurred. 4.11.9 Consequences of Errors Default. BEGIN_RATIONALE 4.11.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If the input files are not properly presorted, the output of comm might not be useful. If a file named posix.2 contains a sorted list of the utilities in this standard, a file named xpg3 contains a sorted list of the utilities specified in X/Open Portability Guide Issue 3, and a file named svid89 contains a sorted list of the utilities in the System V Interface Definition Third Edition: comm -23 posix.2 xpg3 | comm -23 - svid89 would print a list of utilities in this standard not specified by either of the other documents, comm -12 posix.2 xpg3 | comm -12 - svid89 would print a list of utilities specified by all three documents, and Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.11 comm - Select or reject lines common to two files 423 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX comm -12 xpg3 svid89 | comm -23 - posix.2 would print a list of utilities specified by both XPG3 and _S_V_I_D, but not specified in this standard. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e None. END_RATIONALE 4.12 command - Execute a simple command 4.12.1 Synopsis command [-p] _c_o_m_m_a_n_d__n_a_m_e [_a_r_g_u_m_e_n_t ...] 4.12.2 Description The command utility shall cause the shell to treat the arguments as a simple command, suppressing the shell function lookup that is described 1 in 3.9.1.1 item (1)(b). 1 If the _c_o_m_m_a_n_d__n_a_m_e is the same as the name of one of the special built- in utilities, the special properties in the enumerated list at the beginning of 3.14 shall not occur. In every other respect, if _c_o_m_m_a_n_d__n_a_m_e is not the name of a function, the effect of command shall be the same as omitting command. 4.12.3 Options The command utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -p Perform the command search using a default value for PATH that is guaranteed to find all of the standard utilities. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 424 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.12.4 Operands The following operands shall be supported by the implementation: _a_r_g_u_m_e_n_t One of the strings treated as an argument to _c_o_m_m_a_n_d__n_a_m_e. _c_o_m_m_a_n_d__n_a_m_e The name of a utility or a special built-in utility. 4.12.5 External Influences 4.12.5.1 Standard Input None. 4.12.5.2 Input Files None. 4.12.5.3 Environment Variables The following environment variables shall affect the execution of command: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters). LC_MESSAGES This variable shall determine the language in which messages should be written. PATH This variable shall determine the search path used during the command search described in 3.9.1.1, except as described under the -p option. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.12 command - Execute a simple command 425 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.12.5.4 Asynchronous Events Default. 4.12.6 External Effects 4.12.6.1 Standard Output None. 4.12.6.2 Standard Error Used only for diagnostic messages. 4.12.6.3 Output Files None. 4.12.7 Extended Description None. 4.12.8 Exit Status The command utility shall exit with one of the following values: 126 The utility specified by _c_o_m_m_a_n_d__n_a_m_e was found but could not be 1 invoked. 1 127 An error occurred in the command utility or the utility 1 specified by _c_o_m_m_a_n_d__n_a_m_e could not be found. 1 Otherwise, the exit status of command shall be that of the simple command specified by the arguments to command. 4.12.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 426 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.12.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The order for command search in POSIX.2 allows functions to override regular built-ins and path searches. This utility is necessary to allow functions that have the same name as a utility to call the utility (instead of a recursive call to the function). The system default path is available using getconf; however, since getconf may need to have the PATH set up before it can be called itself, the following can be used: command -p getconf _CS_PATH Since command appears in Table 2-2, it will always be found prior to the PATH search. There is nothing in the description of command that implies the command line is parsed any differently than for any other simple command. For example, command a | b ; c is not parsed in any special way that causes | or ; to be treated other than a pipe operator or semicolon or that prevents function lookup on b or c. Examples: Make a version of cd that always prints out the new working directory exactly once: cd() { command cd "$@" >/dev/null pwd } Start off a ``secure shell script'' in which the script avoids being spoofed by its parent: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.12 command - Execute a simple command 427 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX IFS=' ' # The preceding value should be . # Set IFS to its default value. 1 \unset -f command # Ensure command is not a user function. # Note that unset is escaped to prevent an alias being used # for unset on implementations that support aliases. PATH="$(\command -p getconf _CS_PATH):$PATH" # Put on a reliable PATH prefix. # Now, unset all utility names that will be used (or # invoke them with \command each time). # ... At this point, given correct permissions on the directories called by PATH, the script has the ability to ensure that any utility it calls is the intended one. It is being very cautious because it assumes that implementation extensions may be present that would allow user aliases and/or functions to exist when it is invoked; neither capability is specified by POSIX.2, but neither is prohibited as an extension. For example, the proposed UPE supplement to POSIX.2 introduces a ENV variable that precedes the invocation of the script with a user startup script. Such a script could have used the aliasing facility from the UPE or the functions in POSIX.2 to spoof the application. The command, env, nohup, and xargs utilities have been specified to use exit code 127 if an error occurs so that applications can distinguish 1 ``failure to find a utility'' from ``invoked utility exited with an error 1 indication.'' The value 127 was chosen because it is not commonly used 1 for other meanings; most utilities use small values for ``normal error conditions'' and the values above 128 can be confused with termination due to receipt of a signal. The value 126 was chosen in a similar manner 1 to indicate that the utility could be found, but not invoked. Some 1 scripts produce meaningful error messages differentiating the 126 and 127 1 cases. The distinction between exit codes 126 and 127 is based on 2 KornShell practice that uses 127 when all attempts to _e_x_e_c the utility 2 fail with [ENOENT], and uses 126 when any attempt to _e_x_e_c the utility 2 fails for any other reason. 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The command utility is somewhat similar to the Eighth Edition builtin command, but since command also goes to the file system to search for utilities, the name builtin would not be intuitive. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 428 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The command utility will most likely be provided as a regular built-in. In an earlier draft, it was a special built-in. This was changed for the following reasons: - The removal of exportable functions made the special precedence of a special built-in unnecessary. - A special built-in has special properties (see the enumerated list at the beginning of 3.14) that were inappropriate for invoking other utilities. For example, two commands such as date > _u_n_w_r_i_t_a_b_l_e-_f_i_l_e command date > _u_n_w_r_i_t_a_b_l_e-_f_i_l_e would have entirely different results; in a noninteractive script, the former would continue to execute the next command, the latter would abort. Introducing this semantic difference along with suppressing functions was seen to be nonintuitive. - There are some advantages of suppressing the special characteristics of special built-ins on occasion. For example: command exec > _u_n_w_r_i_t_a_b_l_e-_f_i_l_e will not cause a noninteractive script to abort, so that the output status can be checked by the script. An earlier draft presented a larger number of options. Most were removed because they were not useful to real portable applications, given the new command search order. The -p option is present because it is useful to be able to ensure a safe path search that will find all the POSIX.2 standard utilities. This search might not be identical to the one that occurs through one of the POSIX.1 {8} _e_x_e_c functions when PATH is unset, as explained in 2.6.1. At the very least, this feature is required to allow the script to access the correct version of getconf so that the value of the default path can be accurately retrieved. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.12 command - Execute a simple command 429 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.13 cp - Copy files 4.13.1 Synopsis cp [-fip] _s_o_u_r_c_e__f_i_l_e _t_a_r_g_e_t__f_i_l_e 2 cp [-fip] _s_o_u_r_c_e__f_i_l_e ... _t_a_r_g_e_t 2 cp -R [-fip] _s_o_u_r_c_e__f_i_l_e ... _t_a_r_g_e_t 2 cp -r [-fip] _s_o_u_r_c_e__f_i_l_e ... _t_a_r_g_e_t 2 4.13.2 Description The first synopsis form is denoted by two operands, neither of which are existing files of type directory. The cp utility shall copy the contents of _s_o_u_r_c_e__f_i_l_e to the destination path named by _t_a_r_g_e_t__f_i_l_e. The second synopsis form is denoted by two or more operands where the -R or -r options are not specified and the first synopsis form is not applicable. It shall be an error if any _s_o_u_r_c_e__f_i_l_e is a file of type directory, if _t_a_r_g_e_t does not exist, or if _t_a_r_g_e_t is a file of a type defined by POSIX.1 {8}, but is not a file of type directory. The cp utility shall copy the contents of each _s_o_u_r_c_e__f_i_l_e to the destination path named by the concatenation of _t_a_r_g_e_t, a slash character, and the last component of _s_o_u_r_c_e__f_i_l_e. The third and fourth synopsis forms are denoted by two or more operands where the -R or -r options are specified. The cp utility shall copy each file in the file hierarchy rooted in each _s_o_u_r_c_e__f_i_l_e to a destination path named as follows. If _t_a_r_g_e_t exists and is a file of type directory, the name of the corresponding destination path for each file in the file hierarchy shall be the concatenation of _t_a_r_g_e_t, a slash character, and the pathname of the file relative to the directory containing _s_o_u_r_c_e__f_i_l_e. If _t_a_r_g_e_t does not exist, and two operands are specified, the name of the corresponding destination path for _s_o_u_r_c_e__f_i_l_e shall be _t_a_r_g_e_t; the name of the corresponding destination path for all other files in the file hierarchy shall be the concatenation of _t_a_r_g_e_t, a slash character, and the pathname of the file relative to _s_o_u_r_c_e__f_i_l_e. It shall be an error if _t_a_r_g_e_t does not exist and more than two operands are specified, or if _t_a_r_g_e_t exists and is a file of a type defined by POSIX.1 {8}, but is not a file of type directory. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 430 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 In the following description, _s_o_u_r_c_e__f_i_l_e refers to the file that is being copied, whether specified as an operand or a file in a file hierarchy rooted in a _s_o_u_r_c_e__f_i_l_e operand. The term _d_e_s_t__f_i_l_e refers to the file named by the destination path. For each _s_o_u_r_c_e__f_i_l_e, the following steps shall be taken: (1) If _s_o_u_r_c_e__f_i_l_e references the same file as _d_e_s_t__f_i_l_e, cp may write a diagnostic message to standard error; it shall do 1 nothing more with _s_o_u_r_c_e__f_i_l_e and shall go on to any remaining 1 files. 1 (2) If _s_o_u_r_c_e__f_i_l_e is of type directory, the following steps shall be taken: (a) If neither the -R or -r options were specified, cp shall write a diagnostic message to standard error, do nothing more with _s_o_u_r_c_e__f_i_l_e, and go on to any remaining files. (b) If _s_o_u_r_c_e__f_i_l_e was not specified as an operand and _s_o_u_r_c_e__f_i_l_e is dot or dot-dot, cp shall do nothing more with _s_o_u_r_c_e__f_i_l_e and go on to any remaining files. (c) If _d_e_s_t__f_i_l_e exists and it is a file type not specified by POSIX.1 {8}, the behavior is implementation defined. (d) If _d_e_s_t__f_i_l_e exists and it is not of type directory, cp shall write a diagnostic message to standard error, do nothing more with _s_o_u_r_c_e__f_i_l_e or any files below _s_o_u_r_c_e__f_i_l_e in the file hierarchy, and go on to any remaining files. (e) If the directory _d_e_s_t__f_i_l_e does not exist, it shall be created with file permission bits set to the same value as those of _s_o_u_r_c_e__f_i_l_e, modified by the file creation mask of the user if the -p option was not specified, and then bitwise inclusively ORed with S_IRWXU. If _d_e_s_t__f_i_l_e cannot be created, cp shall write a diagnostic message to standard error, do nothing more with _s_o_u_r_c_e__f_i_l_e, and go on to any remaining files. It is unspecified if cp shall attempt to copy files in the file hierarchy rooted in _s_o_u_r_c_e__f_i_l_e. (f) The files in the directory _s_o_u_r_c_e__f_i_l_e shall be copied to the directory _d_e_s_t__f_i_l_e, taking the four steps [(1)-(4)] listed here with the files as _s_o_u_r_c_e__f_i_l_es. (g) If _d_e_s_t__f_i_l_e was created, its file permission bits shall be changed (if necessary) to be the same as those of Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.13 cp - Copy files 431 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _s_o_u_r_c_e__f_i_l_e, modified by the file creation mask of the user if the -p option was not specified. (h) The cp utility shall do nothing more with _s_o_u_r_c_e__f_i_l_e and go on to any remaining files. (3) If _s_o_u_r_c_e__f_i_l_e is of type regular file, the following steps 1 shall be taken: (a) If _d_e_s_t__f_i_l_e exists, the following steps are taken: [1] If the -i option is in effect, the cp utility shall write a prompt to the standard error and read a line from the standard input. If the response is not affirmative, cp shall do nothing more with _s_o_u_r_c_e__f_i_l_e and go on to any remaining files. [2] A file descriptor for _d_e_s_t__f_i_l_e shall be obtained by performing actions equivalent to the POSIX.1 {8} _o_p_e_n() function call using _d_e_s_t__f_i_l_e as the _p_a_t_h argument, and the bitwise inclusive OR of O_WRONLY and O_TRUNC as the _o_f_l_a_g argument. [3] If the attempt to obtain a file descriptor fails and 2 the -f option is in effect, cp shall attempt to 2 remove the file by performing actions equivalent to 2 the POSIX.1 {8} _u_n_l_i_n_k() function called using 2 _d_e_s_t__f_i_l_e as the _p_a_t_h argument. If this attempt 2 succeeds, cp shall continue with step (3b). 2 (b) If _d_e_s_t__f_i_l_e does not exist, a file descriptor shall be obtained by performing actions equivalent to the POSIX.1 {8} _o_p_e_n() function called using _d_e_s_t__f_i_l_e as the _p_a_t_h argument, and the bitwise inclusive OR of O_WRONLY and O_CREAT as the _o_f_l_a_g argument. The file permission bits of _s_o_u_r_c_e__f_i_l_e shall be the _m_o_d_e argument. (c) If the attempt to obtain a file descriptor fails, cp shall write a diagnostic message to standard error, do nothing more with _s_o_u_r_c_e__f_i_l_e, and go on to any remaining files. (d) The contents of _s_o_u_r_c_e__f_i_l_e shall be written to the file descriptor. Any write errors shall cause cp to write a diagnostic message to standard error and continue to step (3)(e). (e) The file descriptor shall be closed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 432 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (f) The cp utility shall do nothing more with _s_o_u_r_c_e__f_i_l_e. If 2 a write error occurred in step (3d), it is unspecified if 2 cp continues with any remaining files. If no write error 2 occurred in step (3d), cp shall go on to any remaining 2 files. 2 (4) Otherwise, the following steps shall be taken: (a) If the -r option was specified, the behavior is 1 implementation defined. 1 (b) If the -R option was specified, the following steps shall 1 be taken: 1 [1] The _d_e_s_t__f_i_l_e shall be created with the same file 1 type as _s_o_u_r_c_e__f_i_l_e. 1 [2] If _s_o_u_r_c_e__f_i_l_e is a file of type FIFO, the file 1 permission bits shall be the same as those of _s_o_u_r_c_e__f_i_l_e, modified by the file creation mask of the user if the -p option was not specified. Otherwise, the permissions, owner ID, and group ID of _d_e_s_t__f_i_l_e are implementation defined. If this creation fails for any reason, cp shall write a diagnostic message to standard error, do nothing more with _s_o_u_r_c_e__f_i_l_e, and go on to any remaining files. If the implementation provides additional or alternate access control mechanisms (see 2.2.2.55), their effect on copies of files is implementation-defined. 4.13.3 Options The cp utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -f If a file descriptor for a destination file cannot be 2 obtained, as described in step (3a)[2], attempt to unlink 2 the destination file and proceed. 2 -i Write a prompt to standard error before copying to any existing destination file. If the response from the standard input is affirmative, the copy shall be attempted, otherwise not. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.13 cp - Copy files 433 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -p Duplicate the following characteristics of each source file in the corresponding destination file: (1) The time of last data modification and time of last access. If this duplication fails for any reason, cp shall write a diagnostic message to standard error. (2) The user ID and group ID. If this duplication fails for any reason, it is unspecified whether cp writes a diagnostic message to standard error. (3) The file permission bits and the S_ISUID and S_ISGID bits. Other, implementation-defined, bits may be duplicated as well. If this duplication fails for any reason, cp shall write a diagnostic message to standard error. If the user ID or the group ID cannot be duplicated, the file permission bits S_ISUID and S_ISGID shall be cleared. If these bits are present in the source file but are not duplicated in the destination file, it is unspecified whether cp writes a diagnostic message to standard error. The order in which the preceding characteristics are duplicated is unspecified. The _d_e_s_t__f_i_l_e shall not be deleted if these characteristics cannot be preserved. -R Copy file hierarchies. -r Copy file hierarchies. The treatment of special files is 1 implementation defined. 1 4.13.4 Operands The following operands shall be supported by the implementation: _s_o_u_r_c_e__f_i_l_e A pathname of a file to be copied. _t_a_r_g_e_t__f_i_l_e A pathname of an existing or nonexisting file, used for the output when a single file is copied. _t_a_r_g_e_t A pathname of a directory to contain the copied file(s). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 434 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.13.5 External Influences 4.13.5.1 Standard Input Used to read an input line in response to each prompt specified in Standard Error. Otherwise, the standard input shall not be used. 4.13.5.2 Input Files The input files specified as operands may be of any file type. 4.13.5.3 Environment Variables The following environment variables shall affect the execution of cp: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_MESSAGES This variable shall determine the processing of affirmative responses and the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.13 cp - Copy files 435 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.13.5.4 Asynchronous Events Default. 4.13.6 External Effects 4.13.6.1 Standard Output None. 4.13.6.2 Standard Error A prompt shall be written to standard error under the conditions specified in 4.13.2. The prompt shall contain the destination pathname, but its format is otherwise unspecified. Otherwise, the standard error shall be used only for diagnostic messages. 4.13.6.3 Output Files The output files may be of any type. 4.13.7 Extended Description None. 4.13.8 Exit Status The cp utility shall exit with one of the following values: 0 No error occurred. >0 An error occurred. 4.13.9 Consequences of Errors If cp is prematurely terminated by a signal or error, files or file hierarchies may be only partially copied and files and directories may have incorrect permissions or access and modification times. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 436 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.13.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e None. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e 2 The -i option exists on BSD systems, giving applications and users a way to avoid accidentally removing files when copying. Although the 4.3BSD version does not prompt if the standard input is not a terminal, the working group decided that use of -i is a request for interaction, so when the destination path exists, the utility takes instructions from whatever responds on standard input. The exact format of the interactive prompts is unspecified. Only the general nature of the contents of prompts are specified, because implementations may desire more descriptive prompts than those used on historical implementations. Therefore, an application using the -i option relies on the system to provide the most suitable dialogue directly with the user, based on the behavior specified. The -p option is historical practice on BSD systems, duplicating the time of last data modification and time of last access. POSIX.2 extends it to preserve the user and group IDs, as well as the file permissions. This requirement has obvious problems in that the directories are almost certainly modified after being copied. This specification requires that the modification times be preserved even so. The statement that the order in which the characteristics are duplicated is unspecified is to permit implementations to provide the maximum amount of security for the user. Implementations should take into account the obvious security issues involved in setting the owner, group, and mode in the wrong order or creating files with an owner, group, or mode different from the final value. It is unspecified whether cp writes diagnostic messages when the user and group IDs cannot be set due to the widespread practice of users using -p to duplicate some portion of the file characteristics, indifferent to the duplication of others. Historic implementations only write diagnostic messages on errors other than [EPERM]. The -r option is historical practice on BSD and BSD-derived systems, copying file hierarchies as opposed to single files. This functionality is used heavily in existing applications and its loss would significantly decrease consensus. The -R option was added as a close synonym to the -r option, selected for consistency with all other options in the standard that do recursive directory descent. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.13 cp - Copy files 437 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The difference between -R and -r is in the treatment by cp of file types other than regular and directory. The original -r flag, for historic reasons, does not handle special files any differently than regular files, but always reads the file and copies its contents. This has obvious problems in the presence of special file types, for example character devices, FIFOs, and sockets. The current cp utility specification is intended to require that the -R option recreate the file hierarchy and that the -r option support historical practice. It is anticipated that a future version of this standard will deprecate the -r option, and for that reason, there has been no attempt to fix its behavior with respect to FIFOs or other file types where copying the file is clearly wrong. However, some systems support -r with the same 1 abilities as the -R defined in POSIX.2. To accommodate them as well as 1 systems that do not, the differences between -r and -R are implementation 1 defined. Implementations may make them identical. 1 When a failure occurs during the copying of a file hierarchy, cp is required to attempt to copy files that are on the same level in the hierarchy or above the file where the failure occurred. It is unspecified if cp shall attempt to copy files below the file where the failure occurred (which cannot succeed in any case). Permissions, owners, and groups of created special file types have been deliberately left as implementation defined. This is to allow systems to satisfy special requirements (for example, allowing users to create character special devices, but requiring them to be owned by a certain group). In general, it is strongly suggested that the permissions, owner, and group be the same as if the user had run the traditional mknod, ln, or other utility to create the file. It is also probable that additional privileges will be required to create block, character, or other, implementation-specific, special file types. Additionally, the -p option explicitly requires that all set-user-ID and 1 set-group-ID permissions be discarded if any of the owner or group IDs cannot be set. This is to keep users from unintentionally giving away special privilege when copying programs. When creating regular files, historical versions of cp use the mode of the source file as modified by the file mode creation mask. Other choices would have been to use the mode of the source file unmodified by the creation mask, or to use the same mode as would be given to a new file created by the user, plus the execution bits of the source file, and then modified by the file mode creation mask. In the absence of any strong reason to change historic practice, it was in large part retained. The one difference is that the set-user-ID and set-group-ID bits are explicitly cleared when files are created. This is to prevent users from creating programs that are set-user-ID/set-group-ID to them when copying files or to make set-user-ID/set-group-ID files accessible to new groups Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 438 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 of users. For example, if a file is set-user-ID and the copy has a different group ID than the source, a new group of users have execute permission to a set-user-ID program than did previously. In particular, this is a problem for super-users copying users' trees. A finer granularity of protection could be specified, in that the set-user- ID/set-group-ID bits could be retained under certain conditions even if the owner or group could not be set, based on a determination that no additional privileges were provided to any users. This was not seen as sufficiently useful for the added complexity. When creating directories, historical versions of cp use the mode of the source directory, plus read, write, and search bits for the owner, as modified by the file mode creation mask. This is done so that cp can copy trees where the user has read permission, but the owner does not. A side effect is that if the file creation mask denies the owner permissions, cp will fail. Also, once the copy is done, historical versions of cp set the permissions on the created directory to be the same as the source directory, unmodified by the file creation mask. This behavior has been modified so that cp will always be able to create the contents of the directory, regardless of the file creation mask. After the copy is done, the permissions are set to be the same as the source directory, as modified by the file creation mask. This latter change from historical behavior is to prevent users from accidentally creating directories with permissions beyond those they would normally set and for consistency with the behavior of cp in creating files. It is not a requirement that cp detect attempts to copy a file to itself; however, implementations are strongly encouraged to do so. Historical implementations have detected the attempt in most cases, which is probably all that is needed. There are two methods of copying subtrees in this standard. The other method is described as part of the pax utility (see 4.48). Both methods are historical practice. The cp utility provides a simpler, more intuitive interface, while pax offers a finer granularity of control. Each provides additional functionality to the other; in particular, pax maintains the hard-link structure of the hierarchy, while cp does not. It is the intention of the working group that the results be similar (using appropriate option combinations in both utilities). The results are not required to be identical; there seemed insufficient gain to applications to balance the difficulty of implementations having to guarantee that the results would be exactly identical. The wording allowing cp to copy a directory to implementation-defined file types not specified by POSIX.1 {8} is provided so that implementations supporting symbolic links are not required to prohibit copying directories to symbolic links. Other extensions to POSIX.1 {8} file types may need to use this loophole as well. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.13 cp - Copy files 439 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 4.14 cut - Cut out selected fields of each line of a file 4.14.1 Synopsis cut -b _l_i_s_t [-n] [_f_i_l_e ...] cut -c _l_i_s_t [_f_i_l_e ...] cut -f _l_i_s_t [-d _d_e_l_i_m] [-s] [_f_i_l_e ...] 4.14.2 Description The cut utility shall cut out bytes (-b option), characters (-c option), or character-delimited fields (-f option) from each line in one or more files, concatenate them, and write them to standard output. 4.14.3 Options The cut utility shall conform to the utility argument syntax guidelines described in 2.10.2. The option-argument _l_i_s_t (see options -b, -c, and -f below) shall be a 2 comma-separated list or -separated list of positive numbers and 2 ranges. Ranges can be in three forms. The first is two positive numbers separated by a hyphen (_l_o_w-_h_i_g_h), which represents all fields from the first number to the second number. The second is a positive number preceded by a hyphen (-_h_i_g_h), which represents all fields from field number 1 to that number. The third is a positive number followed by a hyphen (_l_o_w-), which represents that number to the last field, inclusive. The elements in list can be repeated, can overlap, and can be specified in any order. The following options shall be supported by the implementation: -b _l_i_s_t Cut based on a _l_i_s_t of bytes. Each selected byte shall be output unless the -n option is also specified. It shall not be an error to select bytes not present in the input line. -c _l_i_s_t Cut based on a _l_i_s_t of characters. Each selected character shall be output. It shall not be an error to select characters not present in the input line. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 440 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -d _d_e_l_i_m Set the field delimiter to the character _d_e_l_i_m. The default is the character. -f _l_i_s_t Cut based on a _l_i_s_t of fields, assumed to be separated in the file by a delimiter character (see -d). Each selected field shall be output. Output fields shall be separated by a single occurrence of the field delimiter character. Lines with no field delimiters shall be passed through intact, unless -s is specified. It shall not be an error to select fields not present in the input line. -n Do not split characters. When specified with the -b option, each element in _l_i_s_t of the form _l_o_w-_h_i_g_h (hyphen-separated numbers) shall be modified as follows: If the byte selected by _l_o_w is not the first byte of a character, _l_o_w shall be decremented to select the first byte of the character originally selected by _l_o_w. If the byte selected by _h_i_g_h is not the last byte of a character, _h_i_g_h shall be decremented to select the last byte of the character prior to the character originally selected by _h_i_g_h, or zero if there is no prior character. If the resulting range element has _h_i_g_h equal to zero or _l_o_w greater than _h_i_g_h, the list element shall be dropped from _l_i_s_t for that input line without causing an error. Each element in list of the form _l_o_w- shall be treated as above with _h_i_g_h set to the the number of bytes in the current line, not including the terminating character. Each element in list of the form -_h_i_g_h shall be treated as above with _l_o_w set to 1. Each element in list of the form _n_u_m (a single number) shall be treated as above with _l_o_w set to _n_u_m and _h_i_g_h set to _n_u_m. -s Suppress lines with no delimiter characters, when used with the -f option. Unless specified, lines with no delimiters shall be passed through untouched. 4.14.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of an input file. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.14 cut - Cut out selected fields of each line of a file 441 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.14.5 External Influences 4.14.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. 4.14.5.2 Input Files The input files shall be text files, except that line lengths shall be unlimited. 4.14.5.3 Environment Variables The following environment variables shall affect the execution of cut: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.14.5.4 Asynchronous Events Default. 4.14.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 442 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.14.6.1 Standard Output The cut utility output shall be a concatenation of the selected bytes, characters, or fields (one of the following): "%s\n", <_c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _b_y_t_e_s> "%s\n", <_c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _c_h_a_r_a_c_t_e_r_s> "%s\n", <_c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _f_i_e_l_d_s _a_n_d _f_i_e_l_d _d_e_l_i_m_i_t_e_r_s> 4.14.6.2 Standard Error Used only for diagnostic messages. 4.14.6.3 Output Files None. 4.14.7 Extended Description None. 4.14.8 Exit Status The cut utility shall exit with one of the following values: 0 All input files were output successfully. >0 An error occurred. 4.14.9 Consequences of Errors Default. BEGIN_RATIONALE 4.14.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Examples of the option qualifier list: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.14 cut - Cut out selected fields of each line of a file 443 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 1,4,7 Select the first, fourth, and seventh bytes, characters, or fields and field delimiters. 1-3,8 Equivalent to 1,2,3,8. -5,10 Equivalent to 1,2,3,4,5,10. 3- Equivalent to third through last. The _l_o_w-_h_i_g_h forms are not always equivalent when used with -b and -n and 1 multibyte characters. See the description of -n. 1 The following command: cut -d : -f 1,6 /etc/passwd reads the System V password file (user database) and produces lines of the form: <_u_s_e_r _I_D>:<_h_o_m_e _d_i_r_e_c_t_o_r_y> Most utilities in this standard work on text files. The cut utility can be used to turn files with arbitrary line lengths into a set of text files containing the same data. The paste utility can be used to create (or recreate) files with arbitrary line lengths. For example, if file contains long lines: cut -b 1-500 -n file > file1 cut -b 501- -n file > file2 creates file1 (a text file) with lines no longer than 500 bytes (plus the character and file2 that contains the remainder of the data from file. (Note that file2 will not be a text file if there are lines in file that are longer than 500 + {LINE_MAX} bytes.) The original file can be recreated from file1 and file2 using the command: paste -d "\0" file1 file2 > file _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Some historical implementations do not count characters in determining character counts with the -c option. This may be useful for using cut for processing nroff output. It was deliberately decided not to have the -c option treat either or characters in any special fashion. The fold utility does treat these characters specially. 1 Unlike other utilities, some historical implementations of cut exit after not finding an input file, rather than continuing to process the remaining _f_i_l_e operands. This behavior is prohibited by this standard, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 444 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 where only the exit status is affected by this problem. The behavior of cut when provided with either mutually exclusive options or options that do not make sense together has been deliberately left unspecified in favor of global wording in Section 2. The traditional cut utility has worked in an environment where bytes and characters were equivalent (modulo and processing in some implementations). In the extended world of multibyte characters, the new -b option has been added. The -n option (used with -b) allows it to be used to act on bytes rounded to character boundaries. The algorithm specified for -n guarantees that cut -b 1-500 -n file > file1 cut -b 501- -n file > file2 will end up with all the characters in file appearing exactly once in file1 or file2. (There is, however, a character in both file1 and file2 for each character in file.) END_RATIONALE 4.15 date - Write the date and time 4.15.1 Synopsis date [-u] [+_f_o_r_m_a_t] 4.15.2 Description The date utility shall write the date and time to standard output. By default, the current date and time shall be written. If an operand beginning with + is specified, the output format of date shall be controlled by the field descriptors and other text in the operand. 4.15.3 Options The date utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.15 date - Write the date and time 445 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -u Perform operations as if the TZ environment variable was set to the string UTC0, or its equivalent historical value 2 of GMT0. Otherwise, date shall use the time zone 2 indicated by the TZ environment variable or the system default if that variable is not set. 4.15.4 Operands When the format is specified, each field descriptor shall be replaced in the standard output by its corresponding value. All other characters shall be copied to the output without change. The output shall be always terminated with a character. Field Descriptors %a Locale's abbreviated weekday name. %A Locale's full weekday name. %b Locale's abbreviated month name. %B Locale's full month name. %c Locale's appropriate date and time representation. %C Century (a year divided by 100 and truncated to an integer) as a decimal number (00-99). %d Day of the month as a decimal number (01-31). %D Date in the format _m_m/_d_d/_y_y. %e Day of the month as a decimal number (1-31 in a two-digit field with leading fill). %h A synonym for %b. %H Hour (24-hour clock) as a decimal number (00-23). %I Hour (12-hour clock) as a decimal number (01-12). %j Day of the year as a decimal number (001-366). %m Month as a decimal number (01-12). %M Minute as a decimal number (00-59). %n A character. %p Locale's equivalent of either AM or PM. %r 12-Hour clock time (01-12) using the _A_M/_P_M notation; in the POSIX Locale, this shall be equivalent to "%I:%M:%S %p". %S Seconds as a decimal number (00-61). %t A character. %T 24-Hour clock time (00-23) in the format _H_H:_M_M:_S_S. %U Week number of the year (Sunday as the first day of the week) as a decimal number (00-53). %w Weekday as a decimal number [0 (Sunday)-6]. %W Week number of the year (Monday as the first day of the week) as a decimal number (00-53). %x Locale's appropriate date representation. %X Locale's appropriate time representation. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 446 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 %y Year (offset from %C) as a decimal number (00-99). %Y Year with century as a decimal number. %Z Time-zone name, or no characters if no time zone is determinable. %% A character. See the LC_TIME description in 2.5.2.5 for the field descriptor values in the POSIX Locale. _M_o_d_i_f_i_e_d__F_i_e_l_d__D_e_s_c_r_i_p_t_o_r_s Some field descriptors can be modified by the E and O modifier characters to indicate a different format or specification as specified in the LC_TIME locale description (see 2.5.2.5). If the corresponding keyword (see era, era_year, era_d_fmt, and alt_digits in 2.5.2.5) is not specified or not supported for the current locale, the unmodified field descriptor value shall be used. %Ec Locale's alternate appropriate date and time representation. %EC The name of the base year (period) in the locale's alternate representation. %Ex Locale's alternate date representation. %Ey Offset from %EC (year only) in the locale's alternate representation. %EY Full alternate year representation. %Od Day of month using the locale's alternate numeric symbols. %Oe Day of month using the locale's alternate numeric symbols. %OH Hour (24-hour clock) using the locale's alternate numeric symbols. %OI Hour (12-hour clock) using the locale's alternate numeric symbols. %Om Month using the locale's alternate numeric symbols. %OM Minutes using the locale's alternate numeric symbols. %OS Seconds using the locale's alternate numeric symbols. %OU Week number of the year (Sunday as the first day of the week) using the locale's alternate numeric symbols. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.15 date - Write the date and time 447 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX %Ow Weekday as number in the locale's alternate representation (Sunday = 0). %OW Week number of the year (Monday as the first day of the week) using the locale's alternate numeric symbols. %Oy Year (offset from %C) in alternate representation. 4.15.5 External Influences 4.15.5.1 Standard Input None. 4.15.5.2 Input Files None. 4.15.5.3 Environment Variables The following environment variables shall affect the execution of date: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_TIME This variable shall determine the format and contents of date and time strings written by date. TZ This variable shall specify the time zone in which the time and date are written, unless the -u option is specified. If the TZ variable is not set and the -u is not specified, an unspecified system Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 448 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 default time zone is used. 4.15.5.4 Asynchronous Events Default. 4.15.6 External Effects 4.15.6.1 Standard Output When no formatting operand is specified, the output in the POSIX Locale shall be equivalent to specifying date "+%a %b %e %H:%M:%S %Z %Y" 4.15.6.2 Standard Error Used only for diagnostic messages. 4.15.6.3 Output Files None. 4.15.7 Extended Description None. 4.15.8 Exit Status The date utility shall exit with one of the following values: 0 The date was written successfully. >0 An error occurred. 4.15.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.15 date - Write the date and time 449 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.15.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The option for setting the date and time was not included. It is normally a system administration option, which is outside the scope of POSIX.2. The following are input/output examples of date used at arbitrary times in the POSIX Locale: $ date Tue Jun 26 09:58:10 PDT 1990 $ date "+DATE: %m/%d/%y%nTIME: %H:%M:%S" DATE: 11/21/87 TIME: 13:36:16 $ date "+TIME: %r" TIME: 01:36:32 PM Field descriptors are of unspecified format when not in the POSIX Locale. Some of them can contain s in some locales, so it may be difficult to use the format shown in Standard Output for parsing the output of date in those locales. The range of values for %S extends from 0 to 61 seconds to accommodate the occasional leap second or double leap second. Although certain of the field descriptors in the POSIX Locale (such as the name of the month) are shown with initial capital letters, this need not be the case in other locales. Programs using these fields may need to adjust the capitalization if the output is going to be used at the beginning of a sentence. The date string formatting capabilities are intended for use in Gregorian style calendars, possibly with a different starting year (or years). The %x and %c field descriptors, however, are intended for ``local representation''; these may be based on a different, non-Gregorian calendar. The %C field descriptor was introduced to allow a fallback for the %EC (alternate year format base year); it can be viewed as the base of the current subdivision in the Gregorian calendar. A century is not calculated as an ordinal number; this standard was approved in century 19, not the twentieth (let's hope). Both the %Ey and %y can then be viewed as the offset from %EC and %C, respectively. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 450 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The E and O modifiers modify the traditional field descriptors, so that they can always be used, even if the implementation (or the current locale) does not support the modifier. The E modifier supports alternate date formats, such as the Japanese Emperor's Era, as long as these are based on the Gregorian calendar system. Extending the E modifiers to other date elements may provide an implementation-specific extension capable of supporting other calendar systems, especially in combination with the O modifier. The O modifier supports time and date formats using the locale's alternate numerical symbols, such as Kanji or Hindi digits, or ordinal number representation. Non-European locales, whether they use Latin digits in computational 2 items or not, often have local forms of the digits for use in date 2 formats. This is not totally unknown even in Europe; a variant of dates 2 uses Roman numerals for the months: the third day of September 1991 2 would be written as 3.IX.1991. In Japan, Kanji digits are regularly used 2 for dates; in Arabic-speaking countries, Hindi digits are used. The %d, 2 %e, %H, %I, %m, %S, %U, %w, %W, and %y field descriptors always return 2 the date/time field in Latin digits (i.e., 0 through 9). The %O modifier 2 was introduced to support the use for display purposes of non-Latin 2 digits. In the LC_TIME category in localedef, the optional alt_digits 2 keyword is intended for this purpose. As an example, assume the 2 following (partial) localedef source: 2 alt_digits "";"I";"II";"III";"IV";"V";"VI";"VII";"VIII" \ 2 "IX";"X";"XI";"XII" 2 d_fmt "%e.%Om.%Y" 2 With the above date, the command 2 date "+x" 2 would yield ``3.IX.1991.'' With the same d_fmt, but without the 2 alt_digits, the command would yield ``3.9.1991.'' 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Some of the new options for formatting are from the C Standard {7}. The -u option was introduced to allow portable access to Coordinated Universal Time (UTC). The string GMT0 is allowed as an equivalent TZ 1 value to be compatible with all of the systems using the BSD 1 implementation, where this option originated. The %e format field descriptor (adopted from System V) was added because the C Standard {7} descriptors did not provide any way to produce the historical default date output during the first nine days of any month. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.15 date - Write the date and time 451 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 4.16 dd - Convert and copy a file 4.16.1 Synopsis dd [_o_p_e_r_a_n_d ...] 4.16.2 Description The dd utility shall copy the specified input file to the specified output file with possible conversions using specific input and output block sizes. It shall read the input one block at a time, using the specified input block size; it then shall process the block of data actually returned, which could be smaller than the requested block size. It shall apply any conversions that have been specified and write the resulting data to the output in blocks of the specified output block size. If the bs=_e_x_p_r operand is specified and no conversions other than sync or noerror are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs=_e_x_p_r operand is not specified, or a conversion other than sync or noerror is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached. The processing order shall be as follows: (1) An input block is read. (2) If the input block is shorter than the specified input block size and the sync conversion is specified, null bytes shall be 2 appended to the input data up to the specified size. The remaining conversions and output shall include the pad characters as if they had been read from the input. (3) If the bs=_e_x_p_r operand is specified and no conversion other than sync or noerror is requested, the resulting data shall be written to the output as a single block, and the remaining steps are omitted. (4) If the swab conversion is specified, each pair of input data bytes shall be swapped. If there are an odd number of bytes in the input block, the results are unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 452 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (5) Any remaining conversions (block, unblock, lcase, and ucase) shall be performed. These conversions shall operate on the input data independently of the input blocking; an input or output fixed-length record may span block boundaries. (6) The data resulting from input or conversion or both shall be aggregated into output blocks of the specified size. After the end of input is reached, any remaining output shall be written as a block without padding if conv=sync is not specified; thus the final output block may be shorter than the output block size. 4.16.3 Options None. 4.16.4 Operands All of the operands shall be processed before any input is read. The following operands shall be supported by the implementation: if=_f_i_l_e Specify the input pathname; the default is standard input. of=_f_i_l_e Specify the output pathname; the default is standard output. If the seek=_e_x_p_r conversion is not also specified, the output file shall be truncated before the copy begins, unless conv=notrunc is specified. If seek=_e_x_p_r is specified, but conv=notrunc is not, the effect of the copy shall be to preserve the blocks in the output file over which dd seeks, but no other portion of the output file shall be preserved. (If the size of the seek plus the size of the input file is less than the previous size of the output file, the output file shall be shortened by the copy.) ibs=_e_x_p_r Specify the input block size, in bytes, by _e_x_p_r (default is 512). obs=_e_x_p_r Specify the output block size, in bytes, by _e_x_p_r (default is 512). bs=_e_x_p_r Set both input and output block sizes to _e_x_p_r bytes, superseding ibs= and obs=. If no conversion other than sync, noerror, and notrunc is specified, each 2 input block shall be copied to the output as a single 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.16 dd - Convert and copy a file 453 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX block without aggregating short blocks. cbs=_e_x_p_r Specify the conversion block size for block and unblock in bytes by _e_x_p_r (default is zero). If cbs= 2 is omitted or given a value of zero, using block or 2 unblock produces unspecified results. 2 skip=_n Skip _n input blocks (using the specified input block size) before starting to copy. On seekable files, the implementation shall read the blocks or seek past them; on nonseekable files, the blocks shall be read and the data shall be discarded. seek=_n Skip _n blocks (using the specified output block size) from beginning of output file before copying. On nonseekable files, existing blocks shall be read and space from the current end of file to the specified offset, if any, filled with null bytes; on seekable 2 files, the implementation shall seek to the specified 2 offset or read the blocks as described for nonseekable files. count=_n Copy only _n input blocks. conv=_v_a_l_u_e[,_v_a_l_u_e ...] Where _v_a_l_u_es are comma-separated symbols from the following list. block Treat the input as a sequence of -terminated 2 or end-of-file-terminated variable length records 2 independent of the input block boundaries. Each record shall be converted to a record with a fixed length specified by the conversion block size. Any 2 shall be removed from the input line; 2 s shall be appended to lines that are shorter than their conversion block size to fill the block. Lines that are longer than the conversion block size shall be truncated to the largest number of characters that will fit into that size; the number of truncated lines shall be reported (see Standard Error below). The block and unblock values are mutually exclusive. unblock Convert fixed length records to variable length. Read a number of bytes equal to the conversion block size, delete all trailing s, and append a 2 . 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 454 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 lcase Map uppercase characters specified by the LC_CTYPE keyword tolower to the corresponding lowercase character. Characters for which no mapping is specified shall not be modified by this conversion. The lcase and ucase symbols are mutually exclusive. ucase Map lowercase characters specified by the LC_CTYPE keyword toupper to the corresponding uppercase character. Characters for which no mapping is specified shall not be modified by this conversion. swab Swap every pair of input bytes. noerror Do not stop processing on an input error. When an input error occurs, a diagnostic message shall be written on standard error, followed by the current input and output block counts in the same format as used at completion (see Standard Error). If the sync conversion is specified, the missing input shall be replaced with null bytes and processed normally; otherwise, the input block shall be omitted from the output. notrunc Do not truncate the output file. Preserve blocks in the output file not explicitly written by this invocation of the dd utility. (See also the preceding of=_f_i_l_e operand.) sync Pad every input block to the size of ibs= buffer, appending null bytes. 2 The behavior is unspecified if operands other than conv= are specified more than once. For the bs=, cbs=, ibs=, and obs= operands, the application shall supply an expression specifying a size in bytes. The expression, _e_x_p_r, can be: (1) a positive decimal number; (2) a positive decimal number followed by k, specifying multiplication by 1024; (3) a positive decimal number followed by b, specifying multiplication by 512; or (4) two or more positive decimal numbers (with or without k or b) separated by x, specifying the product of the indicated values. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.16 dd - Convert and copy a file 455 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.16.5 External Influences 4.16.5.1 Standard Input If no if= operand is specified, the standard input shall be used. See Input Files. 4.16.5.2 Input Files The input file can be any file type. 4.16.5.3 Environment Variables The following environment variables shall affect the execution of dd: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files), the classification of characters as upper- or lowercase, and the mapping of characters from one case to the other. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.16.5.4 Asynchronous Events For SIGINT, the dd utility shall write status information to standard error before exiting. It shall take the standard action for all other signals; see 2.11.5.4. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 456 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.16.6 External Effects 4.16.6.1 Standard Output If no of= operand is specified, the standard output shall be used. The nature of the output depends on the operands selected. 4.16.6.2 Standard Error On completion, dd shall write the number of input and output blocks to standard error. In the POSIX Locale the following formats shall be used: "%u+%u records in\n", <_n_u_m_b_e_r _o_f _w_h_o_l_e _i_n_p_u_t _b_l_o_c_k_s>, <_n_u_m_b_e_r _o_f _p_a_r_t_i_a_l _i_n_p_u_t _b_l_o_c_k_s> "%u+%u records out\n", <_n_u_m_b_e_r _o_f _w_h_o_l_e _o_u_t_p_u_t _b_l_o_c_k_s>, <_n_u_m_b_e_r _o_f _p_a_r_t_i_a_l _o_u_t_p_u_t _b_l_o_c_k_s> A partial input block is one for which _r_e_a_d() returned less than the input block size. A partial output block is one that was written with fewer bytes than specified by the output block size. In addition, when there is at least one truncated block, the number of truncated blocks shall be written to standard error. In the POSIX Locale, the format shall be: "%u truncated %s\n", <_n_u_m_b_e_r _o_f _t_r_u_n_c_a_t_e_d _b_l_o_c_k_s>, "block" [if <_n_u_m_b_e_r _o_f _t_r_u_n_c_a_t_e_d _b_l_o_c_k_s> is one] "blocks" [otherwise] Diagnostic messages may also be written to standard error. 4.16.6.3 Output Files If the of= operand is used, the output shall be the same as described in Standard Output. 4.16.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.16 dd - Convert and copy a file 457 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.16.8 Exit Status The dd utility shall exit with one of the following values: 0 The input file was copied successfully. >0 An error occurred. 4.16.9 Consequences of Errors If an input error is detected and the noerror conversion has not been specified, any partial output block shall be written to the output file, a diagnostic message shall be written, and the copy operation shall be discontinued. If some other error is detected, a diagnostic message shall be written and the copy operation shall be discontinued. BEGIN_RATIONALE 4.16.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The input and output block size can be specified to take advantage of raw physical I/O. The following command: dd if=/dev/rmt0h of=/dev/rmt1h copies from tape drive 0 to tape drive 1, using a common historical device naming convention. The following command: dd ibs=10 skip=1 strips the first 10 bytes from standard input. A suggested implementation technique for conv=noerror,sync is to zero the input buffer before each read and to write the contents of the input buffer to the output even after an error. In this manner, any data transferred to the input buffer before the error was detected will be preserved. Another point is that a failed read on a regular file or a disk will generally not increment the file offset, and dd must then seek past the block on which the error occurred; otherwise, the input error will occur repetitively. When the input is a magnetic tape, however, the tape will normally have passed the block containing the error when the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 458 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 error is reported, and thus no seek is necessary. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Table 4-4 - ASCII to EBCDIC Conversion __________________________________________________________________________________________________________________________________________________ 0 1 2 3 4 5 6 7 ____ ____ ____ ____ ____ ____ ____ ____ 0000 0000 0001 0002 0003 0067 0055 0056 0057 0010 0026 0005 0045 0013 0014 0015 0016 0017 0020 0020 0021 0022 0023 0074 0075 0062 0046 0030 0030 0031 0077 0047 0034 0035 0036 0037 0040 0100 0132 0177 0173 0133 0154 0120 0175 0050 0115 0135 0134 0116 0153 0140 0113 0141 0060 0360 0361 0362 0363 0364 0365 0366 0367 0070 0370 0371 0172 0136 0114 0176 0156 0157 0100 0174 0301 0302 0303 0304 0305 0306 0307 0110 0310 0311 0321 0322 0323 0324 0325 0326 0120 0327 0330 0331 0342 0343 0344 0345 0346 0130 0347 0350 0351 0255 0340 0275 0_2_3_2_ 0155 0140 0171 0201 0202 0203 0204 0205 0206 0207 0150 0210 0211 0221 0222 0223 0224 0225 0226 0160 0227 0230 0231 0242 0243 0244 0245 0246 0170 0247 0250 0251 0300 0117 0320 0137 0007 ____ 0200 0040 0041 0042 0043 0044 0025 0006 0027 0210 0050 0051 0052 0053 0054 0011 0012 0033 0220 0060 0061 0032 0063 0064 0065 0066 0010 0230 0070 0071 0072 0073 0004 0024 0076 0341 0240 0101 0102 0103 0104 0105 0106 0107 0110 0250 0111 0121 0122 0123 0124 0125 0126 0127 0260 0130 0131 0142 0143 0144 0145 0146 0147 0270 0150 0151 0160 0161 0162 0163 0164 0165 0300 0166 0167 0170 0200 0212 0213 0214 0215 0310 0216 0217 0220 0_1_5_2_ 0233 0234 0235 0236 0320 0237 0240 0252 0253 0254 0112 0256 0257 ____ 0330 0260 0261 0262 0263 0264 0265 0266 0267 0340 0270 0271 0272 0273 0274 0_2_4_1_ 0276 0277 0350 0312 0313 0314 0315 0316 0317 0332 0333 0360 0334 0335 0336 0337 0352 0353 0354 0355 0370 0356 0357 0372 0373 0374 0375 0376 0377 __________________________________________________________________________________________________________________________________________________ The Options subclause is listed as ``None'' because there are no options recognized by historical dd utilities. Certainly, many of the operands Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.16 dd - Convert and copy a file 459 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 4-5 - ASCII to IBM EBCDIC Conversion __________________________________________________________________________________________________________________________________________________ 0 1 2 3 4 5 6 7 ____ ____ ____ ____ ____ ____ ____ ____ 0000 0000 0001 0002 0003 0067 0055 0056 0057 0010 0026 0005 0045 0013 0014 0015 0016 0017 0020 0020 0021 0022 0023 0074 0075 0062 0046 0030 0030 0031 0077 0047 0034 0035 0036 0037 0040 0100 0132 0177 0173 0133 0154 0120 0175 0050 0115 0135 0134 0116 0153 0140 0113 0141 0060 0360 0361 0362 0363 0364 0365 0366 0367 0070 0370 0371 0172 0136 0114 0176 0156 0157 0100 0174 0301 0302 0303 0304 0305 0306 0307 0110 0310 0311 0321 0322 0323 0324 0325 0326 0120 0327 0330 0331 0342 0343 0344 0345 0346 0130 0347 0350 0351 0255 0340 0275 0_1_3_7_ 0155 0140 0171 0201 0202 0203 0204 0205 0206 0207 0150 0210 0211 0221 0222 0223 0224 0225 0226 0160 0227 0230 0231 0242 0243 0244 0245 0246 0170 0247 0250 0251 0300 0117 0320 0241 0007 ____ 0200 0040 0041 0042 0043 0044 0025 0006 0027 0210 0050 0051 0052 0053 0054 0011 0012 0033 0220 0060 0061 0032 0063 0064 0065 0066 0010 0230 0070 0071 0072 0073 0004 0024 0076 0341 0240 0101 0102 0103 0104 0105 0106 0107 0110 0250 0111 0121 0122 0123 0124 0125 0126 0127 0260 0130 0131 0142 0143 0144 0145 0146 0147 0270 0150 0151 0160 0161 0162 0163 0164 0165 0300 0166 0167 0170 0200 0212 0213 0214 0215 0310 0216 0217 0220 0_2_3_2_ 0233 0234 0235 0236 0320 0237 0240 0252 0253 0254 0255 0256 0257 ____ 0330 0260 0261 0262 0263 0264 0265 0266 0267 0340 0270 0271 0272 0273 0274 0_2_7_5_ 0276 0277 0350 0312 0313 0314 0315 0316 0317 0332 0333 0360 0334 0335 0336 0337 0352 0353 0354 0355 0370 0356 0357 0372 0373 0374 0375 0376 0377 __________________________________________________________________________________________________________________________________________________ could have been designed to use the Utility Syntax Guidelines, which would have resulted in the classic hyphenated option letters. In this version of this standard, dd retains its curious JCL-like syntax due to the large number of applications that depend on the historical implementation. ``Fixing'' the interface would cause an excessive compatibility problem. However, due to interest in the international Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 460 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 community, the developers of the standard have agreed to provide an alternative syntax for the next version of this standard that conforms to the spirit of the Utility Syntax Guidelines. This new syntax will be accompanied by the existing syntax, marked as obsolescent. System implementors are encouraged to develop and promulgate a new syntax for dd, perhaps using a different utility name, that can be adopted for the next version of this standard. The default ibs= and obs= sizes are specified as 512 bytes because there are existing (largely portable) scripts that assume these values. If they were left unspecified, very strange results could occur if an implementation chose an odd block size. Historical implementations of dd used _c_r_e_a_t() when processing of=file. This makes the seek= operand unusable except on special files. More recent BSD-based implementations use _o_p_e_n() (without O_TRUNC) instead of _c_r_e_a_t(), but fail to delete output file contents after the data copied. Since balloting showed a desire to make this behavior available, the conv=notrunc feature was added. The w multiplier, (historically meaning _w_o_r_d), is used in System V to mean 2 and in 4.2BSD to mean 4. Since _w_o_r_d is inherently nonportable, its use is not supported by POSIX.2. All references to US ASCII and to conversions to/from IBM and EBCDIC were removed in preparation for this document's acceptance by the international community. Implementations are free to have such conversions as extensions, using the ascii, ibm, and ebcdic keywords. However, in the interest of promoting consistency of implementation, the original material from an early draft has been restored to the rationale as an example: In the two tables, the conversions from ASCII to either standard EBCDIC (Table 4-4) or the IBM version of EBCDIC (Table 4-5) are shown. The differences between the two tables are underlined. In 1 both tables, the ASCII values are the row and column headers and 1 the EBCDIC values are found at their intersections. For example, 1 ASCII 0012 (LF) is the second row, third column, yielding 0045 in 1 EBCDIC. The inverted tables (for EBCDIC to ASCII conversion) are 1 not shown, but are in one-to-one correspondence with these tables. 1 The tables are understood to match recent System V conversion 1 algorithms and there have been reports that earlier System V 1 versions and the BSD version do not always conform to these; 1 however, representatives of the BSD development group have agreed 1 that a future version of their system will use these tables for 1 consistency with System V. 1 The cbs operand is required if any of the ascii, ebcdic, or ibm 2 operands are specified. For the ascii operand, the input is 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.16 dd - Convert and copy a file 461 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX handled as described for the unblock operand except that characters 2 are converted to ASCII before the trailing s are deleted. 2 For the ebcdic and ibm operands, the input is handled as described 2 for the block operand except that the characters are converted to 2 EBCDIC or IBM EBCDIC after the trailing s are added. 2 The block and unblock keywords are from historical BSD practice. 2 Early drafts only allowed two numbers separated by x to be used in a product when specifying bs=, cbs=, ibs=, and obs= sizes. This was changed to reflect the historical practice of allowing multiple numbers in the product as provided by Version 7 and all releases of System V and BSD. END_RATIONALE 4.17 diff - Compare two files 4.17.1 Synopsis diff [ -c | -e | -C _n ] [-br] _f_i_l_e_1 _f_i_l_e_2 4.17.2 Description The diff utility shall compare the contents of _f_i_l_e_1 and _f_i_l_e_2 and write to standard output a list of changes necessary to convert _f_i_l_e_1 into _f_i_l_e_2. This list should be minimal. No output shall be produced if the files are identical. 4.17.3 Options The diff utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -b Cause trailing s to be ignored and other strings of s to compare equal. -c Produce output in a form that provides three lines of context. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 462 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -C _n Produce output in a form that provides _n lines of context (where _n shall be interpreted as a positive decimal integer). -e Produce output in a form suitable as input for the ed utility (see 4.20), which can then be used to convert _f_i_l_e_1 into _f_i_l_e_2. -r Apply diff recursively to files and directories of the same name when _f_i_l_e_1 and _f_i_l_e_2 are both directories. 4.17.4 Operands The following operands shall be supported by the implementation: _f_i_l_e_1 _f_i_l_e_2 A pathname of a file be compared. If either the _f_i_l_e_1 or _f_i_l_e_2 operand is -, the standard input shall be used in its place. If both _f_i_l_e_1 and _f_i_l_e_2 are directories, diff shall not compare block special files, character special files, or FIFO special files to any files and shall not compare regular files to directories. The system documentation shall specify the behavior of diff on implementation- specific file types not specified by POSIX.1 {8} when found in directories. Further details are as specified in 4.17.6.1.1. If only one of _f_i_l_e_1 and _f_i_l_e_2 is a directory, diff shall be applied to the nondirectory file and the file contained in the directory file with a filename that is the same as the last component of the nondirectory file. 4.17.5 External Influences 4.17.5.1 Standard Input The standard input shall be used only if one of the _f_i_l_e_1 or _f_i_l_e_2 operands references standard input. See Input Files. 4.17.5.2 Input Files The input files shall be text files. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.17 diff - Compare two files 463 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.17.5.3 Environment Variables The following environment variables shall affect the execution of diff: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_TIME This variable shall determine the locale for affecting the format of file time stamps written with the -C and -c options. TZ This variable shall determine the locale for affecting the time zone used for calculating file time stamps written with the -C and -c options. 4.17.5.4 Asynchronous Events Default. 4.17.6 External Effects 4.17.6.1 Standard Output 4.17.6.1.1 diff Directory Comparison Format If both _f_i_l_e_1 and _f_i_l_e_2 are directories, the following output formats shall be used. In the POSIX Locale, each file that is present in only one directory shall be reported using the following format: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 464 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 "Only in %s: %s\n", <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e>, <_f_i_l_e_n_a_m_e> In the POSIX Locale, subdirectories that are common to the two directories may be reported with the following format: "Common subdirectories: %s and %s\n", <_d_i_r_e_c_t_o_r_y_1 _p_a_t_h_n_a_m_e>, <_d_i_r_e_c_t_o_r_y_2 _p_a_t_h_n_a_m_e> For each file common to the two directories if the two files are not to be compared, the following format shall be used in the POSIX Locale: "File %s is a %s while file %s is a %s\n", <_d_i_r_e_c_t_o_r_y_1 _p_a_t_h_n_a_m_e>, <_f_i_l_e _t_y_p_e _o_f _d_i_r_e_c_t_o_r_y_1 _p_a_t_h_n_a_m_e>, <_d_i_r_e_c_t_o_r_y_2 _p_a_t_h_n_a_m_e>, <_f_i_l_e _t_y_p_e _o_f _d_i_r_e_c_t_o_r_y_2 _p_a_t_h_n_a_m_e> For each file common to the two directories, if the files are to be compared and are identical, no output shall be written. If the two files differ, the following format shall be written: 2 "diff %s %s %s\n", <_d_i_f_f__o_p_t_i_o_n_s>, <_f_i_l_e_n_a_m_e_1>, <_f_i_l_e_n_a_m_e_2> where <_d_i_f_f__o_p_t_i_o_n_s> are the options as specified on the command line. Depending on these options, one of the following output formats shall be used to write the differences. All directory pathnames listed in this subclause shall be relative to the original command line arguments. All other names of files listed in this subclause shall be filenames (pathname components). 4.17.6.1.2 diff Default Output Format The default (without -e, -c, or -C options) diff utility output contains lines of these forms: "%da%d\n", <_n_u_m_1>, <_n_u_m_2> "%da%d,%d\n", <_n_u_m_1>, <_n_u_m_2>, <_n_u_m_3> "%dd%d\n", <_n_u_m_1>, <_n_u_m_2> "%d,%dd%d\n", <_n_u_m_1>, <_n_u_m_2>, <_n_u_m_3> "%dc%d\n", <_n_u_m_1>, <_n_u_m_2> "%d,%dc%d\n", <_n_u_m_1>, <_n_u_m_2>, <_n_u_m_3> "%dc%d,%d\n", <_n_u_m_1>, <_n_u_m_2>, <_n_u_m_3> Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.17 diff - Compare two files 465 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX "%d,%dc%d,%d\n", <_n_u_m_1>, <_n_u_m_2>, <_n_u_m_3>, <_n_u_m_4> These lines resemble ed subcommands to convert _f_i_l_e_1 into _f_i_l_e_2. The line numbers before the action letters shall pertain to _f_i_l_e_1; those after shall pertain to _f_i_l_e_2. Thus, by exchanging 'a' for 'd' and reading the line in reverse order, one can also determine how to convert _f_i_l_e_2 into _f_i_l_e_1. As in ed, identical pairs (where _n_u_m_1 = _n_u_m_2) are abbreviated as a single number. Following each of these lines, diff shall write to standard output all lines affected in the first file using the format: "<W%s", <_l_i_n_e> and all lines affected in the second file using the format: ">W%s", <_l_i_n_e> If there are lines affected in both _f_i_l_e_1 and _f_i_l_e_2 (as with the c subcommand), the changes are separated with a line consisting of three hyphens: "---\n" 4.17.6.1.3 diff -e Output Format With the -e option, a script shall be produced that shall, when provided as input to ed (see 4.20), along with an appended w (write) command, convert _f_i_l_e_1 into _f_i_l_e_2. Only the a (append), c (change), d (delete), i (insert), and s (substitute) commands of ed shall be used in this script. Text line(s), except those consisting of the single character period (.), shall be output as they appear in the file. 4.17.6.1.4 diff -c or -C Output Format With the -c or -C option, the output format shall consist of affected lines along with surrounding lines of context. The affected lines shall show which ones need to be deleted or changed in _f_i_l_e_1, and those added from _f_i_l_e_2. With the -c option, three lines of context, if available, shall be written before and after the affected lines. With the -C option, the user can specify how many lines of context shall be written. The exact format follows. The name and last modification time of each file shall be output in the following format: "*** %s %s\n", _f_i_l_e_1, <_f_i_l_e_1 _t_i_m_e _s_t_a_m_p> "--- %s %s\n", _f_i_l_e_2, <_f_i_l_e_2 _t_i_m_e _s_t_a_m_p> Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 466 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 and a string of 15 asterisks: "***************\n" Each <_f_i_l_e> field shall be the pathname of the corresponding file being compared. The pathname written for standard input is unspecified. In the POSIX Locale, each <_t_i_m_e _s_t_a_m_p> field shall be equivalent to the output from the following command: date "+%a %b %e %T %Y" without the trailing , executed at the time of last modification of the corresponding file (or the current time, if the file is standard input). Then, the following output formats shall be applied for every set of changes. First, the range of lines in _f_i_l_e_1 shall be written in the following format: "*** %d,%d ****\n", <_b_e_g_i_n_n_i_n_g _l_i_n_e _n_u_m_b_e_r>, <_e_n_d_i_n_g _l_i_n_e _n_u_m_b_e_r> Next, the affected lines along with lines of context (unaffected lines) shall be written. Unaffected lines shall be written in the following format: "WW%s", <_u_n_a_f_f_e_c_t_e_d__l_i_n_e> Deleted lines shall be written as: "-W%s", <_d_e_l_e_t_e_d__l_i_n_e> Changed lines shall be written as: "!W%s", <_c_h_a_n_g_e_d__l_i_n_e> Next, the range of lines in _f_i_l_e_2 shall be written in the following format: "--- %d,%d ----\n", <_b_e_g_i_n_n_i_n_g _l_i_n_e _n_u_m_b_e_r>, <_e_n_d_i_n_g _l_i_n_e _n_u_m_b_e_r> Then, lines of context and changed lines shall be written as described in the previous formats. Lines added from _f_i_l_e_2 shall be written in the following format: "+W%s", <_a_d_d_e_d__l_i_n_e> Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.17 diff - Compare two files 467 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.17.6.2 Standard Error Used only for diagnostic messages. 4.17.6.3 Output Files None. 4.17.7 Extended Description None. 4.17.8 Exit Status The diff utility shall exit with one of the following values: 0 No differences were found. 1 Differences were found. >1 An error occurred. 4.17.9 Consequences of Errors Default. BEGIN_RATIONALE 4.17.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If lines at the end of a file are changed and other lines are added, diff output may show this as a delete and add, as a change, or as a change and add; diff is not expected to know which happened and users should not care about the difference in output as long as it clearly shows the differences between the files. If dir1 is a directory containing a directory named x, dir2 is a directory containing a directory named x, dir1/x and dir2/x both contain files named date.out, and dir2/x contains a file named y, the command: diff -r dir1 dir2 could produce output similar to: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 468 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Common subdirectories: dir1/x and dir2/x Only in dir2/x: y diff -r dir1/x/date.out dir2/x/date.out 1c1 < Mon Jul 2 13:12:16 PDT 1990 --- > Tue Jun 19 21:41:39 PDT 1990 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The -h option was removed because it was insufficiently specified and it does not add to application portability. Current implementations employ algorithms that do not always produce a minimum list of differences; the current language about making every effort is the best the standard can do, as there is no metric that could be employed to judge the quality of implementations against any and all file contents. The statement ``This list should be minimal'' clearly implies that implementations are not expected to provide the following output when comparing two 100-line files that differ in only one character on a single line: 1,100c1,100 all 100 lines from file1 preceded with "< " --- all 100 lines from file2 preceded with "> " The ``Only in'' messages required by this standard when the -r option is specified, is not used by most historical implementations if the -e option is also specified. It is required here because it provides useful information that must be provided to update a target directory hierarchy to match a source hierarchy. The ``Common subdirectories'' messages are written by System V and 4.3BSD when the -r option is specified. They are allowed here, but are not required because they are reporting on something that is the same, not reporting a difference, and are not needed to update a target hierarchy. The -c option, which writes output in a format using lines of context, has been included. The format is useful for a variety of reasons, among them being much improved readability, and the ability to understand difference changes when the target file has line numbers that differ from another similar, but slightly different, copy. An important utility, patch, which has proved itself indispensable to the USENET community, often only works with difference listings using the context format. The BSD version of -c takes an optional argument specifying the amount of context. Rather than overloading -c and breaking the Utility Syntax Guidelines for diff, the working group decided to add a separate option for specifying a context diff with a specified amount of context (-C). Also, the format for context diffs was extended slightly in 4.3BSD to Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.17 diff - Compare two files 469 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX allow multiple changes that are within context lines from each other to be merged together. The output format contains an additional four asterisks after the range of affected lines in the first filename. This was to provide a flag for old programs (like old versions of patch) that only understand the old context format. The version of context described here does not require that multiple changes within context lines be merged, but does not prohibit it either. The extension is upward compatible, so any vendors that wish to retain the old version of diff can do so by just adding the extra four asterisks (that is, utilities that currently use diff and understand the new merged format will also understand the old unmerged format, but not vice-versa). The substitute command was added as an additional format for the -e option. This was added to provide implementations a way to fix the classic ``dot alone on a line'' bug present in many versions of diff. Since many implementations have fixed this bug the working group decided not to standardize broken behavior, but rather, provide the necessary tool for fixing the bug. One way to fix this bug is to output two periods whenever a lone period is needed, then terminate the append command with a period, and then use the substitute command to convert the two periods into one period. The -f flag was not included as it provides no additional functionality over the -e option. The BSD-derived -r option was added to provide a mechanism for using diff to compare two file system trees. This behavior is useful, is standard practice on all BSD-derived systems, and is not easily reproducible with the find utility. The requirement that diff not compare files in some circumstances, even though they have the same name, was added in response to ballot objections and digging further into the actual output of historical implementations. The message specified here is already in use when a directory is being compared to a nondirectory. It is extended here to preclude the problems arising from running into FIFOs and other files that would cause diff to hang waiting for input with no indication to the user that diff was hung. In most common usage, diff -r should indicate differences in the file hierarchies, not the difference of contents of devices pointed to by the hierarchies. Many early implementations of diff require seekable files. Since POSIX.1 {8} supports named pipes, the working group decided that such a restriction was unreasonable. Note also that the allowed file name - almost always refers to a pipe. No directory search order is being specified in 4.17.6.1.1. The historical ordering is, in fact, not optimal, in that it prints out all of the differences at the current level, including the statements about Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 470 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 all common subdirectories before recursing into those subdirectories. The message 2 "diff %s %s %s\n", <_d_i_f_f__o_p_t_i_o_n_s>, <_f_i_l_e_n_a_m_e_1>, <_f_i_l_e_n_a_m_e_2> 2 does not vary by locale because it is the representation of a command, 2 not an English sentence. 2 END_RATIONALE 2 4.18 dirname - Return directory portion of pathname 4.18.1 Synopsis dirname _s_t_r_i_n_g 4.18.2 Description The _s_t_r_i_n_g operand shall be treated as a pathname, as defined in 2.2.2.102. The string _s_t_r_i_n_g shall be converted to the name of the directory containing the filename corresponding to the last pathname component in _s_t_r_i_n_g, performing actions equivalent to the following steps in order: (1) If _s_t_r_i_n_g is //, skip steps (2) through (5). (2) If _s_t_r_i_n_g consists entirely of slash characters, _s_t_r_i_n_g shall be set to a single slash character. In this case, skip steps (3) through (8). (3) If there are any trailing slash characters in _s_t_r_i_n_g, they shall be removed. (4) If there are no slash characters remaining in _s_t_r_i_n_g, _s_t_r_i_n_g shall be set to a single period character. In this case, skip steps (5) through (8). (5) If there are any trailing nonslash characters in _s_t_r_i_n_g, they shall be removed. (6) If the remaining _s_t_r_i_n_g is //, it is implementation defined whether steps (7) and (8) are skipped or processed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.18 dirname - Return directory portion of pathname 471 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (7) If there are any trailing slash characters in _s_t_r_i_n_g, they shall be removed. (8) If the remaining _s_t_r_i_n_g is empty, _s_t_r_i_n_g shall be set to a single slash character. The resulting string shall be written to standard output. 4.18.3 Options None. 4.18.4 Operands The following operand shall be supported by the implementation: _s_t_r_i_n_g A string. 4.18.5 External Influences 4.18.5.1 Standard Input None. 4.18.5.2 Input Files None. 4.18.5.3 Environment Variables The following environment variables shall affect the execution of dirname: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 472 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.18.5.4 Asynchronous Events Default. 4.18.6 External Effects 4.18.6.1 Standard Output The dirname utility shall write a line to the standard output in the following format: "%s\n", <_r_e_s_u_l_t_i_n_g _s_t_r_i_n_g> 4.18.6.2 Standard Error Used only for diagnostic messages. 4.18.6.3 Output Files None. 4.18.7 Extended Description None. 4.18.8 Exit Status The dirname utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.18 dirname - Return directory portion of pathname 473 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.18.9 Consequences of Errors Default. BEGIN_RATIONALE 4.18.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The dirname utility originated in System III. It has evolved through the System V releases to a version that matches the requirements specified in this description in System V Release 3. 4.3BSD and earlier versions did not include dirname. Table 4-6 indicates the results required for some invocations of dirname. Table 4-6 - dirname Examples __________________________________________________________________________________________________________________________________________________ Command Results ______________________________ dirname / / dirname // / or // dirname /a/b/ /a dirname //a//b// //a dirname _u_n_s_p_e_c_i_f_i_e_d dirname a . ($? = 0) dirname "" . ($? = 0) dirname /a / dirname /a/b /a dirname a/b a __________________________________________________________________________________________________________________________________________________ _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The behaviors of basename and dirname in this standard have been coordinated so that when _s_t_r_i_n_g is a valid pathname $(basename "string") would be a valid filename for the file in the directory $(dirname "string") Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 474 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 This would not work for the versions of these utilities in earlier drafts due to the way processing of trailing slashes was specified. Consideration was given to leaving processing unspecified if there were trailing slashes, but this cannot be done; the POSIX.1 {8} definition of pathname allows trailing slashes. The basename and dirname utilities have to specify consistent handling for all valid pathnames. Since the definition of _p_a_t_h_n_a_m_e in 2.2.2.102 specifies implementation- defined behavior for pathnames starting with two slash characters, Draft 11 has been changed to specify similar implementation-defined behavior for the basename and dirname utilities. On implementations where the pathname // is always treated the same as the pathname /, the functionality required by Draft 10 meets all of the Draft 11 requirements. END_RATIONALE 4.19 echo - Write arguments to standard output 4.19.1 Synopsis echo [_s_t_r_i_n_g ...] 4.19.2 Description The echo utility shall write its arguments to standard output, followed by a character. If there are no arguments, only the character shall be written. 4.19.3 Options The echo utility shall not recognize the -- argument in the manner specified by utility syntax guideline 10 in 2.10.2; -- shall be recognized as a string operand. Implementations need not support any options. 4.19.4 Operands The following operands shall be supported by the implementation: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.19 echo - Write arguments to standard output 475 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _s_t_r_i_n_g A string to be written to standard output. If the first operand is "-n" or if any of the operands contain a backslash (\) character, the results are implementation defined. 4.19.5 External Influences 4.19.5.1 Standard Input None. 4.19.5.2 Input Files None. 4.19.5.3 Environment Variables The following environment variables shall affect the execution of echo: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_MESSAGES This variable shall determine the language in which diagnostic messages should be written. 4.19.5.4 Asynchronous Events Default. 4.19.6 External Effects 4.19.6.1 Standard Output The echo utility arguments shall be separated by single s and a character shall follow the last argument. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 476 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.19.6.2 Standard Error Used only for diagnostic messages. 4.19.6.3 Output Files None. 4.19.7 Extended Description None. 4.19.8 Exit Status The echo utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.19.9 Consequences of Errors Default. BEGIN_RATIONALE 4.19.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e As specified by this standard, echo writes its arguments in the simplest of ways. The two different historical versions of echo vary in fatal incompatible ways. The BSD echo checks the first argument for the string "-n", which causes it to suppress the character that would otherwise follow the final argument in the output. The System V echo does not support any options, but allows escape sequences within its operands: \a Write an character. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.19 echo - Write arguments to standard output 477 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX \b Write a character. \c Suppress the character that otherwise follows the final argument in the output. All characters following the \c in the arguments are ignored. \f Write a character. \n Write a character. \r Write a character. \t Write a character. \v Write a character. \\ Write a backslash character. \0_n_u_m Write an 8-bit value that is the 1-, 2-, or 3-digit octal number _n_u_m. It is not possible to use echo portably across these two implementations unless both -n (as the first argument) and escape sequences are omitted. The printf utility (see 4.50) can be used to portably emulate any of the traditional behaviors of the echo utility as follows: - The System V echo is equivalent to: printf "%b\n" "$*" - The BSD echo is equivalent to: if [ "X$1" = "X-n" ] then shift printf "%s" "$*" else printf "%s\n" "$*" fi The echo utility does not support utility syntax guideline 10 because existing applications depend on echo to echo _a_l_l of its arguments, except for the -n option in the BSD version. New applications are encouraged to use printf instead of echo. The echo utility has not been made obsolescent because of its extremely widespread use in existing applications. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 478 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e In Draft 8, an attempt was made to merge the extensions of BSD and System V, supporting both -n and escape sequences. During initial ballot resolution, a -e option was proposed to enable the escape conventions. Both attempts failed, as there are historical scripts that would be broken by any attempt at reconciliation. Therefore, in Draft 9 only the simplest version of echo is presented. Implementation-defined extensions on BSD and System V will keep historical applications content. Portable applications that wish to do prompting without s or that could possibly be expecting to echo a "-n", should use the new printf utility (see 4.50), derived from the Ninth Edition. The LC_CTYPE variable is not cited because echo, as specified here, does not need to understand the characters in its arguments. The System V and BSD implementations might need to be sensitive to it because of their extensions. END_RATIONALE 4.20 ed - Edit text 4.20.1 Synopsis ed [-p _s_t_r_i_n_g] [-s] [_f_i_l_e] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: ed [-p _s_t_r_i_n_g] [-] [_f_i_l_e] 4.20.2 Description The ed utility is a line-oriented text editor that shall use two modes: _c_o_m_m_a_n_d _m_o_d_e and _i_n_p_u_t _m_o_d_e. In command mode the input characters shall be interpreted as commands, and in input mode they shall be interpreted as text. See 4.20.7. 4.20.3 Options The ed utility shall conform to the utility argument syntax guidelines described in 2.10.2, except for its nonstandard usage of - in the obsolescent version. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 479 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following options shall be supported by the implementation: -p _s_t_r_i_n_g Use _s_t_r_i_n_g as the prompt string when in command mode. By default, there shall be no prompt string. -s Suppress the writing of byte counts by e, E, r, and w commands and of the ! prompt after a !_c_o_m_m_a_n_d. - (Obsolescent.) Equivalent to the -s option. 4.20.4 Operands The following operand shall be supported by the implementation: _f_i_l_e If the _f_i_l_e argument is given, ed shall simulate an e command on the file named by the pathname, _f_i_l_e, before accepting commands from the standard input. 4.20.5 External Influences 4.20.5.1 Standard Input The standard input shall be a text file consisting of commands, as described in 4.20.7. 4.20.5.2 Input Files The input files shall be text files. 4.20.5.3 Environment Variables The following environment variables shall affect the execution of ed: HOME This variable shall determine the pathname of the user's home directory. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 480 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files), the behavior of character classes within regular expressions. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.20.5.4 Asynchronous Events The ed utility shall take the standard action for all signals (see 2.11.5.4), with the following exceptions: SIGINT The ed utility shall interrupt its current activity, write the string "?\n" to standard output, and return to command mode (see 4.20.7). SIGHUP If the buffer is not empty and has changed since the last write, the ed utility shall attempt to write a copy of the buffer in a file. First, the file named ed.hup in the current directory shall be used; if that fails, the file named ed.hup in the directory named by the HOME environment variable shall be used. In any case, the ed utility shall exit without returning to command mode. 4.20.6 External Effects 4.20.6.1 Standard Output Various editing commands and the prompting feature (see -p) write to standard output, as described in 4.20.7. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 481 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.20.6.2 Standard Error Used only for diagnostic messages. 4.20.6.3 Output Files The output files shall be text files whose formats are dependent on the editing commands given. 4.20.7 Extended Description The ed utility shall operate on a copy of the file it is editing; changes made to the copy shall have no effect on the file until a w (write) command is given. The copy of the text is called the _b_u_f_f_e_r in this clause, although no attempt is made to imply a specific implementation. Commands to ed have a simple and regular structure: zero, one, or two _a_d_d_r_e_s_s_e_s followed by a single-character _c_o_m_m_a_n_d, possibly followed by parameters to that command. These addresses specify one or more lines in the buffer. Every command that requires addresses has default addresses, so that the addresses very often can be omitted. If the -p option is specified, the prompt string shall be written to standard output before each command is read. In general, only one command can appear on a line. Certain commands allow text to be input. This text is placed in the appropriate place in the buffer. While ed is accepting text, it is said to be in _i_n_p_u_t _m_o_d_e. In this mode, no commands shall be recognized; all input is merely collected. Input mode is terminated by entering a line consisting of two characters: a period (.) followed by a . This line is not considered part of the input text. _4._2_0._7._1 ed _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s The ed utility shall support basic regular expressions, as described in 2.8.3. Since regular expressions in ed are always matched against single lines, never against any larger section of text, there is no way for a regular expression to match a . A null RE shall be equivalent to the last RE encountered. Regular expressions are used in addresses to specify lines, and in some commands (for example, the s substitute command) to specify portions of a line to be substituted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 482 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _4._2_0._7._2 ed _A_d_d_r_e_s_s_e_s Addressing in ed relates to the _c_u_r_r_e_n_t _l_i_n_e. Generally, the current line is the last line affected by a command. The _c_u_r_r_e_n_t _l_i_n_e _n_u_m_b_e_r is the address (line number) of the current line. The exact effect on the current line number is discussed under the description of each command. The f, h, H, k, P, w, =, and ! commands shall not modify the current line number. Addresses are constructed as follows: (1) The character . (period) shall address the current line. (2) The character $ shall address the last line of the buffer. (3) A positive decimal number _n shall address the _n-th line of the buffer. The first line in the buffer is line number 1. (4) '_x shall address the line marked with the mark name character _x, which shall be a lowercase letter from the portable character set. Lines can be marked with the k command described in 4.20.7.3.13. (5) An RE enclosed by slashes (/) shall address the first line found by searching forward from the line following the current line toward the end of the buffer and stopping at the first line containing a string matching the RE. [As stated in 4.20.7.1, an address consisting of a null RE delimited by slashes (//) shall address the next line containing the last RE encountered.] If necessary, the search shall wrap around to the beginning of the buffer and continue up to and including the current line, so that the entire buffer is searched. Within the RE, the sequence \/ shall represent a literal slash instead of the RE delimiter. (6) An RE enclosed in question-marks (?) shall address the first line found by searching backward from the line preceding the current line toward the beginning of the buffer and stopping at the first line containing a string matching the RE. If necessary, the search wraps around to the end of the buffer and continues up to and including the current line. Within the RE, the sequence \? shall represent a literal question-mark instead of the RE delimiter. (7) An address followed by a plus sign (+) or a minus sign (-) followed by a decimal number specifies that address plus (respectively minus) the indicated number of lines. The plus sign can be omitted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 483 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (8) If an address begins with + or -, the addition or subtraction is taken with respect to the current line number; for example, -5 is understood to mean .-5. (9) If an address ends with + or -, then 1 shall be added to or subtracted from the address, respectively. As a consequence of this rule and of rule (8) immediately above, the address - shall refer to the line preceding the current line. Moreover, trailing + and - characters shall have a cumulative effect, so -- shall refer to the current line number less 2. (10) A comma (,) shall stand for the address pair 1,$, while a semicolon (;) shall stand for the pair .,$. Commands require zero, one, or two addresses. Commands that require no addresses shall regard the presence of an address as an error. Commands that accept one or two addresses assume default addresses when no addresses are given, as described in 4.20.7.3. If one address is given to a command that allows two addresses, the command shall operate as if it were specified as: _g_i_v_e_n__a_d_d_r_e_s_s;. _c_o_m_m_a_n_d If more addresses are given than such a command requires, the results are undefined. Typically, addresses are separated from each other by a comma. They can also be separated by a semicolon. In the latter case, the current line number (.) shall be set to the first address, and only then shall the second address be calculated. This feature can be used to determine the starting line for forward and backward searches [see rules (5) and (6) above]. The second address of any two-address sequence shall correspond to a line that does not precede, in the buffer, the line corresponding to the first address. _4._2_0._7._3 ed _C_o_m_m_a_n_d_s In the following list of ed commands, the default addresses are shown in parentheses. The number of addresses shown in the default shall be the number expected by the command. The parentheses are not part of the address; they show that the given addresses are the default. It is generally invalid for more than one command to appear on a line. However, any command (except e, E, f, q, Q, r, w, and !) can be suffixed by the letter l, n, or p; in which case, except for the l, n, and p commands, the command shall be executed and then the new current line 1 shall be written as described below under the l, n, and p commands. When 1 an l, n, or p suffix is used with an l, n, or p command, the command shall write to standard output as described below, but it is unspecified Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 484 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 whether the suffix writes the current line again in the requested format or whether the suffix has no effect. For example, the pl command (base p command with an l suffix) shall either write just the current line or shall write it twice--once as specified for p and once as specified for l. Also, the g, G, v, and V commands shall take a command as a parameter. Each address component can be preceded by zero or more _s. The command letter can be preceded by zero or more _s. If a suffix letter (l, n, or p) is given, it shall immediately follow the command. The e, E, f, r, and w commands shall take an optional _f_i_l_e parameter, separated from the command letter by one or more s. If changes have been made in the buffer since the last w command that wrote the entire buffer, ed shall warn the user if an attempt is made to destroy the editor buffer via the e or q commands. The ed utility shall write the string: "?\n" (followed by an explanatory message if _h_e_l_p _m_o_d_e has been enabled via the H command) to standard output and shall continue in command mode with the current line number unchanged. If the e or q command is repeated with no intervening command, it shall take effect. If an end-of-file is detected on standard input when a command is expected, the ed utility shall act as if a q command had been entered. If the closing delimiter of an RE or of a replacement string (e.g., /) in a g, G, s, v, or V command would be the last character before a , that delimiter can be omitted, in which case the addressed line shall be written. For example, the following pairs of commands are equivalent: s/s1/s2 s/s1/s2/p g/s1 g/s1/p ?s1 ?s1? If an invalid command is entered, ed shall write the string: "?\n" (followed by an explanatory message if _h_e_l_p _m_o_d_e has been enabled via the H command) to standard output and shall continue in command mode with the current line number unchanged. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 485 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.20.7.3.1 Append Command _S_y_n_o_p_s_i_s: (.)a <_t_e_x_t> . The _a_p_p_e_n_d command shall read the given text and append it after the addressed line; the current line number shall become the address of the last inserted line, or, if there were none, the addressed line. Address 0 shall be valid for this command: it shall cause the ``appended'' text to be placed at the beginning of the buffer. 4.20.7.3.2 Change Command _S_y_n_o_p_s_i_s: (.,.)c 1 <_t_e_x_t> . The _c_h_a_n_g_e command shall delete the addressed lines, then accept input text that replaces these lines; the current line shall be set to the address of the last line input; or, if there were none, at the line after the last line deleted; if the lines deleted were originally at the end of the buffer, the current line number shall be set to the address of the new last line; if no lines remain in the buffer, the current line number shall be set to zero. 4.20.7.3.3 Delete Command _S_y_n_o_p_s_i_s: (.,.)d The _d_e_l_e_t_e command shall delete the addressed lines from the buffer. The address of the line after the last line deleted shall become the current line number; if the lines deleted were originally at the end of the buffer, the current line number shall be set to the address of the new last line; if no lines remain in the buffer, the current line number shall be set to zero. 4.20.7.3.4 Edit Command _S_y_n_o_p_s_i_s: e [_f_i_l_e] The _e_d_i_t command shall delete the entire contents of the buffer and then read in the file named by the pathname _f_i_l_e. The current line number shall be set to the address of the last line of the buffer. If no pathname is given, the currently remembered pathname, if any, shall be used (see the f command). The number of bytes read shall be written to standard output, unless the -s option was specified, in the following format: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 486 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 "%d\n", <_n_u_m_b_e_r _o_f _b_y_t_e_s _r_e_a_d> The name _f_i_l_e shall be remembered for possible use as a default pathname in subsequent e, E, r, and w commands. If _f_i_l_e is replaced by !, the rest of the line shall be taken to be a shell command line whose output is to be read. Such a shell command line shall not be remembered as the current _f_i_l_e. All marks shall be discarded upon the completion of a successful e command. If the buffer has changed since the last time the entire buffer was written, the user shall be warned, as described previously. 4.20.7.3.5 Edit Without Checking Command _S_y_n_o_p_s_i_s: E [_f_i_l_e] The _E_d_i_t command shall possess all properties and restrictions of the e command except that the editor shall not check to see if any changes have been made to the buffer since the last w command. 4.20.7.3.6 File-Name Command _S_y_n_o_p_s_i_s: f [_f_i_l_e] If _f_i_l_e is given, the file-name command shall change the currently remembered pathname to _f_i_l_e; whether the name is changed or not, it then shall write the (possibly new) currently remembered pathname to the standard output in the following format: "%s\n", <_p_a_t_h_n_a_m_e> The current line number shall be unchanged. 4.20.7.3.7 Global Command _S_y_n_o_p_s_i_s: (1,$)g/_R_E/_c_o_m_m_a_n_d _l_i_s_t In the _g_l_o_b_a_l command, the first step shall be to mark every line that matches the given _R_E. Then, for every such line, the given _c_o_m_m_a_n_d _l_i_s_t shall be executed with the current line number set to the address of that line. When the g command completes, the current line number shall have the value assigned by the last command in the command list. If there were no matching lines, the current line number shall not be changed. A single command or the first of a list of commands shall appear on the same line as the global command. All lines of a multiline list except the last line shall be ended with a backslash; the a, i, and c commands and associated input are permitted. The . terminating input mode can be omitted if it would be the last line of the _c_o_m_m_a_n_d _l_i_s_t. An empty _c_o_m_m_a_n_d _l_i_s_t shall be equivalent to the p command. The use of the g, G, v, V, and ! commands in the _c_o_m_m_a_n_d _l_i_s_t produces undefined results. Any Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 487 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX character other than or can be used instead of a slash to delimit the _R_E. Within the RE, the RE delimiter itself can be used as a literal character if it is preceded by a backslash. 4.20.7.3.8 Interactive Global Command _S_y_n_o_p_s_i_s: (1,$)G/_R_E/ In the _i_n_t_e_r_a_c_t_i_v_e _g_l_o_b_a_l command, the first step shall be to mark every line that matches the given _R_E. Then, for every such line, that line shall be written, the current line number shall be set to the address of that line, and any one command (other than one of the a, c, i, g, G, v, and V commands) can be input and shall be executed. A shall act as a null command (causing no action to be taken on the current line); an & shall cause the reexecution of the most recent nonnull command executed within the current invocation of G. Note that the commands input as part of the execution of the G command can address and affect any lines in the buffer. The final value of the current line number shall be the value set by the last command successfully executed. (Note that the last command successfully executed shall be the G command itself if a command fails or the null command is specified.) If there were no matching lines, the current line number shall not be changed. The G command can be terminated by a SIGINT signal. Any character other than or can be used instead of a slash to delimit the _R_E and the replacement. Within the RE, the RE delimiter itself can be used as a literal character if it is preceded by a backslash. 4.20.7.3.9 Help Command _S_y_n_o_p_s_i_s: h The _h_e_l_p command shall write a short message to standard output that explains the reason for the most recent ? notification. The current line number shall be unchanged. 4.20.7.3.10 Help-Mode Command _S_y_n_o_p_s_i_s: H The _H_e_l_p command shall cause ed to enter a mode in which help messages (see the h command) shall be written to standard output for all subsequent ? notifications. The H command alternately shall turn this mode on and off; it shall be initially off. If the help-mode is being turned on, the H command also shall explain the previous ? notification, if there was one. The current line number shall be unchanged. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 488 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.20.7.3.11 Insert Command _S_y_n_o_p_s_i_s: (.)i <_t_e_x_t> . The _i_n_s_e_r_t command shall insert the given text before the addressed line; . shall be left at the last inserted line, or, if there was none, at the addressed line. This command differs from the a command only in the placement of the input text. Address 0 shall be invalid for this command. 4.20.7.3.12 Join Command _S_y_n_o_p_s_i_s: (.,.+1)j The _j_o_i_n command shall join contiguous lines by removing the appropriate characters. If exactly one address is given, this command shall do nothing. If lines are joined, the current line number shall be set to the address of the joined line; otherwise, the current line number shall be unchanged. 4.20.7.3.13 Mark Command _S_y_n_o_p_s_i_s: (.)k_x The _m_a_r_k command shall mark the addressed line with name _x, which shall be a lowercase letter from the portable character set. The address '_x then shall refer to this line; the current line number shall be unchanged. 4.20.7.3.14 List Command _S_y_n_o_p_s_i_s: (.,.)l The _l_i_s_t command shall write to standard output the addressed lines in a 1 visually unambiguous form. The characters listed in Table 2-15 (see 1 2.12) shall be written as the corresponding escape sequence. 1 Nonprintable characters not in Table 2-15 shall be written as one three- 1 digit octal number (with a preceding ) for each byte in the 1 character (most significant byte first). If the size of a byte on the 1 system is greater than nine bits, the format used for nonprintable 1 characters is implementation defined. 1 Long lines shall be folded, with the point of folding indicated by 1 writing ; the length at which folding occurs is 1 unspecified, but should be appropriate for the output device. The end of 1 each line shall be marked with a $. An l command can be appended to any 1 other command other than e, E, f, q, Q, r, w, or !. The current line Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 489 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX number shall be set to the address of the last line written. 4.20.7.3.15 Move Command _S_y_n_o_p_s_i_s: (.,.)m_a_d_d_r_e_s_s The _m_o_v_e command shall reposition the addressed line(s) after the line addressed by _a_d_d_r_e_s_s. Address 0 shall be valid for _a_d_d_r_e_s_s and cause the addressed line(s) to be moved to the beginning of the buffer. It shall be an error if address _a_d_d_r_e_s_s falls within the range of moved lines. The current line number shall be set to the address of the last line moved. 4.20.7.3.16 Number Command _S_y_n_o_p_s_i_s: (.,.)n The _n_u_m_b_e_r command shall write to standard output the addressed lines, preceding each line by its line number and a character; the current line number shall be set to the address of the last line written. The n command can be appended to any other command other than e, E, f, q, Q, r, w, or !. 4.20.7.3.17 Print Command _S_y_n_o_p_s_i_s: (.,.)p The _p_r_i_n_t command shall write to standard output the addressed lines; the current line number shall be set to the address of the last line written. The p command can be appended to any other command other than e, E, f, q, Q, r, w, or !. 4.20.7.3.18 Prompt Command _S_y_n_o_p_s_i_s: P The _P_r_o_m_p_t command shall cause ed to prompt with an asterisk (*) (or _s_t_r_i_n_g, if -p is specified) for all subsequent commands. The P command alternately shall turn this mode on and off; it shall be initially on if the -p option is specified, otherwise off. The current line number shall be unchanged. 4.20.7.3.19 Quit Command _S_y_n_o_p_s_i_s: q The _q_u_i_t command shall cause ed to exit. If the buffer has changed since the last time the entire buffer was written, the user shall be warned, as described previously. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 490 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.20.7.3.20 Quit Without Checking Command _S_y_n_o_p_s_i_s: Q The _Q_u_i_t command shall cause ed to exit without checking if changes have been made in the buffer since the last w command. 4.20.7.3.21 Read Command _S_y_n_o_p_s_i_s: ($)r [_f_i_l_e] The _r_e_a_d command shall read in the file named by the pathname _f_i_l_e and append it after the addressed line. If no _f_i_l_e argument is given, the currently remembered pathname, if any, shall be used (see e and f commands). The currently remembered pathname shall not be changed unless there is no remembered pathname. Address 0 shall be valid for r and shall cause the file to be read at the beginning of the buffer. If the read is successful, and -s was not specified, the number of bytes read shall be written to standard output in the following format: "%d\n", <_n_u_m_b_e_r _o_f _b_y_t_e_s _r_e_a_d> The current line number shall be set to the address of the last line read in. If _f_i_l_e is replaced by !, the rest of the line shall be taken to be a shell command line whose output is to be read. Such a shell command line shall not be remembered as the current pathname. 4.20.7.3.22 Substitute Command _S_y_n_o_p_s_i_s: (.,.)s/_R_E/_r_e_p_l_a_c_e_m_e_n_t/_f_l_a_g_s The _s_u_b_s_t_i_t_u_t_e command shall search each addressed line for an occurrence of the specified RE and replace either the first or all (nonoverlapped) matched strings with the _r_e_p_l_a_c_e_m_e_n_t; see the following description of the g suffix. It is an error if the substitution fails on every addressed line. Any character other than or can be used instead of a slash to delimit the _R_E and the replacement. Within the RE, the RE delimiter itself can be used as a literal character if it is preceded by a backslash. The current line shall be set to the address of the last line on which a substitution occurred. An ampersand (&) appearing in the _r_e_p_l_a_c_e_m_e_n_t shall be replaced by the string matching the RE on the current line. The special meaning of & in this context can be suppressed by preceding it by backslash. As a more general feature, the characters \_n, where _n is a digit, shall be replaced by the text matched by the corresponding backreference expression (see 2.8.3.3). When the character % is the only character in the _r_e_p_l_a_c_e_m_e_n_t, the _r_e_p_l_a_c_e_m_e_n_t used in the most recent substitute command shall be used as the _r_e_p_l_a_c_e_m_e_n_t in the current substitute command; if there was no Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 491 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX previous substitute command, the use of % in this manner shall be an error. The % shall lose its special meaning when it is in a replacement string of more than one character or is preceded by a backslash. A line can be split by substituting a character into it. The 1 application shall escape the in the _r_e_p_l_a_c_e_m_e_n_t by preceding it 1 by backslash. Such substitution cannot be done as part of a g or v command list. The current line number shall be set to the address of the last line on which a substitution is performed. If no substitution is performed, the current line number shall be unchanged. If a line is split, a substitution shall be considered to have been performed on each of the new lines for the purpose of determining the new current line number. A substitution shall be considered to have been performed even if the replacement string is identical to the string that it replaces. The value of _f_l_a_g_s shall be zero or more of: _c_o_u_n_t Substitute for the _c_o_u_n_tth occurrence only of the _R_E found on each addressed line. g Globally substitute for all nonoverlapping instances of the _R_E rather than just the first one. If both g and _c_o_u_n_t are specified, the results are unspecified. l Write to standard output the final line in which a substitution was made. The line shall be written in the format specified for the l command. n Write to standard output the final line in which a substitution was made. The line shall be written in the format specified for the n command. p Write to standard output the final line in which a substitution was made. The line shall be written in the format specified for the p command. 4.20.7.3.23 Copy Command _S_y_n_o_p_s_i_s: (.,.)t_a_d_d_r_e_s_s The t command shall be equivalent to the m command, except that a copy of the addressed lines shall be placed after address _a_d_d_r_e_s_s (which can be 0); the current line number shall be set to the address of the last line added. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 492 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.20.7.3.24 Undo Command _S_y_n_o_p_s_i_s: u The _u_n_d_o command shall nullify the effect of the most recent command that modified anything in the buffer, namely the most recent a, c, d, g, i, j, m, r, s, t, u, v, G, or V command. All changes made to the buffer by a 1 g, G, v, or V global command shall be ``undone'' as a single change; if 1 no changes were made by the global command (such as with g/_R_E/p), the u 1 command shall have no effect. The current line number shall be set to 1 the value it had immediately before the command being undone started. 4.20.7.3.25 Global Non-Matched Command _S_y_n_o_p_s_i_s: (1,$)v/_R_E/_c_o_m_m_a_n_d _l_i_s_t This command shall be equivalent to the global command g except that the lines that are marked during the first step shall be those that do not match the RE. 4.20.7.3.26 Interactive Global Not-Matched Command _S_y_n_o_p_s_i_s: (1,$)V/_R_E/ This command shall be equivalent to the interactive global command G except that the lines that are marked during the first step shall be those that do not match the RE. 4.20.7.3.27 Write Command _S_y_n_o_p_s_i_s: (1,$)w [_f_i_l_e] The _w_r_i_t_e command shall write the addressed lines into the file named by the pathname _f_i_l_e. The command shall create the file, if it does not exist, or shall replace the contents of the existing file. The currently remembered pathname shall not be changed unless there is no remembered pathname. If no pathname is given, the currently remembered pathname, if any, shall be used (see e and f commands); the current line number shall be unchanged. If the command is successful, the number of bytes written shall be written to standard output, unless the -s option was specified, in the following format: "%d\n", <_n_u_m_b_e_r _o_f _b_y_t_e_s _w_r_i_t_t_e_n> If _f_i_l_e begins with !, the rest of the line shall be taken to be a shell command line whose standard input shall be the addressed lines. Such a shell command line shall not be remembered as the current pathname. This 1 usage of the write command with ! shall not be considered as a ``last w 1 command that wrote the entire buffer,'' as described previously; thus, 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 493 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX this alone shall not prevent the warning to the user if an attempt is 1 made to destroy the editor buffer via the e or q commands. 1 4.20.7.3.28 Line Number Command _S_y_n_o_p_s_i_s: ($)= The line number of the addressed line shall be written to standard output in the following format: "%d\n", <_l_i_n_e _n_u_m_b_e_r> The current line number shall be unchanged by this command. 4.20.7.3.29 Shell Escape Command _S_y_n_o_p_s_i_s: !_c_o_m_m_a_n_d The remainder of the line after the ! shall be sent to the command interpreter to be interpreted as a shell command line. Within the text of that shell command line, the unescaped character % shall be replaced with the remembered pathname; if a ! appears as the first character of the command, it shall be replaced with the text of the previous shell command executed via !. Thus, !! shall repeat the previous !_c_o_m_m_a_n_d. If 2 any replacements of % and/or ! are performed, the modified line shall be 2 written to the standard output before _c_o_m_m_a_n_d is executed. The ! command 2 shall write 2 "!\n" to standard output upon completion, unless the -s option is specified. The current line number shall be unchanged. 4.20.7.3.30 Null Command _S_y_n_o_p_s_i_s: (.+1) An address alone on a line shall cause the addressed line to be written. A alone shall be equivalent to .+1p. The current line number shall be set to the address of the written line. 4.20.8 Exit Status The ed utility shall exit with one of the following values: 0 Successful completion without any file or command errors. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 494 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 >0 An error occurred. 4.20.9 Consequences of Errors When an error in the input script is encountered, or when an error is 1 detected that is a consequence of the data (not) present in the file or 1 due to an external condition such as a read or write error: 1 - If the standard input is a terminal device file, all input shall be 2 flushed, and a new command read. 2 - If the standard input is a regular file, ed shall terminate with a 2 nonzero exit status. 2 BEGIN_RATIONALE 2 4.20.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Some historical implementations contained a bug that allowed a single period to be entered in input mode as . This is not allowed by the POSIX.2 ed because there is no description of escaping any of the characters in input mode; backslashes are entered into the buffer exactly as typed. The typical method of entering a single period has been to precede it with another character and then use the substitute command to delete that character. Because of the extremely terse nature of the default error messages, the 1 prudent script writer will begin the ed input commands with an H command, 1 so that if any errors do occur at least some clue as to the cause will be 1 made available. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The initial description of this utility was adapted from the _S_V_I_D. It contains some features not found in Version 7 or BSD-derived systems. Some of the differences between the POSIX.2 and BSD ed utilities include, but need not be limited to: - The BSD - option does not suppress the ! prompt after a ! command. - BSD does not support the special meanings of the % and ! characters within a ! command. - BSD does not support the _a_d_d_r_e_s_s_e_s ; and ,. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 495 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - BSD allows the command/suffix pairs pp, ll, etc., which are unspecified in POSIX.2. - BSD does not support the ! character part of the e, r, or w commands. - A failed g command in BSD sets the line number to the last line searched if there are no matches. - BSD does not default the command list to the p command. - BSD does not support the G, h, H, n, or V commands. - On BSD, if there is no inserted text, the insert command changes the current line to the referenced line -1; i.e., the line before the specified line. - On BSD, the join command with only a single address changes the current line to that address. - BSD does not support the P command; moreover, in BSD it is synonymous with the p command. - BSD does not support the _u_n_d_o of the commands j, m, r, s, or t. - The BSD ed commands W, wq, and z are not present in POSIX.2. The -s option was added to allow the functionality of the - option in a manner compatible with the Utility Syntax Guidelines. It is the intent of the working group that portable applications use the -s option, and that in the future the - option be removed from the standard. Prior to Draft 8 there was a limit, {ED_FILE_MAX}, which described the historical limitations of some ed utilities in their handling of large files; some of these have had problems with files in the >100KB range. It was this limitation that prompted much of the desire to include a split command in the standard. Since this limit was removed, the standard requires that implementations document the file size limits imposed by ed in the conformance document. The limit {ED_LINE_MAX} was also removed; therefore, the global limit {LINE_MAX} is used for input and output lines. The \{_m,_n\} notation was removed from the description of regular expressions because this functionality is now described in 2.8.3. The manner in which the l command writes nonprintable characters was changed to avoid the historical backspace-overstrike method. On video display terminals, the overstrike is ambiguous because most terminals 1 simply replace overstruck characters, making the l format not useful for 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 496 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 its intended purpose of unambiguously understanding the content of the 1 line. The historical backslash escapes were also ambiguous. (The string "a\0011" could represent a line containing those six characters or a line containing the three characters 'a', a byte with a binary value of 1, and a '1'.) In the format required here, a backslash appearing in the line will be written as "\\" so that the output is truly unambiguous. The 1 method of marking the ends of lines was adopted from the ex editor (see 1 the User Portability Extension) and is required for any line ending in 1 _s; the $ is placed on all lines so that a real $ at the end of a 1 line cannot be misinterpreted. 1 Systems with bytes too large to fit into three octal digits must devise 1 other means of displaying nonprintable characters. Consideration was 1 given to requiring that the number of octal digits be large enough to 1 hold a byte, but this seemed to be too confusing for applications on the 1 vast majority of systems where three digits are adequate. It would be 1 theoretically possible for the application to use the getconf utility to 1 find out the {CHAR_BIT} value and deal with such an algorithm; however, 1 there is really no portable way that an application can use the octal 1 values of the bytes across various coded character sets anyway, so the 1 additional specification did not seem worth the effort. 1 The description of how a NUL is written was removed. The NUL character cannot be in text files, and the standard should not dictate behavior in the case of undefined, erroneous input. The text requiring filenames accepted by the E, e, R, and r commands to be patterns was removed due to balloting objections that this was undesirable and not existing practice. The -p option in Drafts 8 and 9 said that it only worked when standard input was associated with a terminal device. This has been changed to conform to existing implementations, thereby allowing applications to interpose themselves between a user and the ed utility. The form of the substitute command that uses the _n suffix was limited to the first 512 matches in a previous draft (where this was described incorrectly as ``backreferencing''). This limit has been removed because there is no reason an editor processing lines of {LINE_MAX} length should have this restriction. The command s/x/X/2047 should be able to substitute the 2047th occurrence of x on a line. The use of printing commands with printing suffixes (such as pn, lp, etc.) was made unspecified because BSD-based systems allow this, whereas System V does not. Some BSD-based systems exit immediately upon receipt of end-of-file if all of the lines in the file had been deleted. Since POSIX.2 refers to the q command in this instance, such behavior is not allowed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.20 ed - Edit text 497 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Some historical implementations returned exit status zero even if command errors had occurred; this is not allowed by POSIX.2. END_RATIONALE 4.21 env - Set environment for command invocation 4.21.1 Synopsis env [-i] [_n_a_m_e=_v_a_l_u_e] ... [_u_t_i_l_i_t_y [_a_r_g_u_m_e_n_t ...]] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: env [-] [_n_a_m_e=_v_a_l_u_e] ... [_u_t_i_l_i_t_y [_a_r_g_u_m_e_n_t ...]] 4.21.2 Description The env utility shall obtain the current environment, modify it according to its arguments, then invoke the utility named by the _u_t_i_l_i_t_y operand with the modified environment. Optional arguments shall be passed to _u_t_i_l_i_t_y. If no _u_t_i_l_i_t_y operand is specified, the resulting environment shall be written to the standard output, with one _n_a_m_e=_v_a_l_u_e pair per line. 4.21.3 Options The env utility shall conform to the utility argument syntax guidelines described in 2.10.2, except for its nonstandard usage of -, which is obsolescent. The following options shall be supported by the implementation: -i Invoke _u_t_i_l_i_t_y with exactly the environment specified by the arguments; the inherited environment shall be ignored completely. - (Obsolescent.) Equivalent to the -i option. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 498 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.21.4 Operands The following operands shall be supported by the implementation: _n_a_m_e=_v_a_l_u_e Arguments of the form _n_a_m_e=_v_a_l_u_e modify the execution environment, and are placed into the inherited environment before the _u_t_i_l_i_t_y is invoked. _u_t_i_l_i_t_y The name of the utility to be invoked. If the _u_t_i_l_i_t_y operand names any of the special built-in utilities in 3.14, the results are undefined. _a_r_g_u_m_e_n_t A string to pass as an argument for the invoked utility. 4.21.5 External Influences 4.21.5.1 Standard Input None. 4.21.5.2 Input Files None. 4.21.5.3 Environment Variables The following environment variables shall affect the execution of env: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.21 env - Set environment for command invocation 499 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX PATH This variable shall determine the location of the _u_t_i_l_i_t_y, as described in 2.6. If PATH is specified as a _n_a_m_e=_v_a_l_u_e operand to env, the _v_a_l_u_e given shall be used in the search for _u_t_i_l_i_t_y. 4.21.5.4 Asynchronous Events Default. 4.21.6 External Effects 4.21.6.1 Standard Output If no _u_t_i_l_i_t_y operand is specified, each _n_a_m_e=_v_a_l_u_e pair in the resulting environment shall be written in the form: "%s=%s\n", <_n_a_m_e>, <_v_a_l_u_e> If the _u_t_i_l_i_t_y operand is specified, the env utility shall not write to standard output. 4.21.6.2 Standard Error Used only for diagnostic messages. 4.21.6.3 Output Files None. 4.21.7 Extended Description None. 4.21.8 Exit Status If the _u_t_i_l_i_t_y utility is invoked, the exit status of env shall be the exit status of _u_t_i_l_i_t_y; otherwise, the env utility shall exit with one of the following values: 0 The env utility completed successfully. 1-125 An error occurred in the env utility. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 500 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 126 The utility specified by _u_t_i_l_i_t_y was found but could not be 1 invoked. 1 127 The utility specified by _u_t_i_l_i_t_y could not be found. 1 4.21.9 Consequences of Errors Default. BEGIN_RATIONALE 4.21.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following command: env -i PATH=/mybin mygrep xyz myfile invokes the command mygrep with a new PATH value as the only entry in its environment. In this case, PATH is used to locate mygrep, which then must reside in /mybin. As with all other utilities that invoke other utilities, the standard only specifies what env does with standard input, standard output, standard error, input files, and output files. If a utility is executed, it is not constrained by env's specification of input and output. The command, env, nohup, and xargs utilities have been specified to use exit code 127 if an error occurs so that applications can distinguish 1 ``failure to find a utility'' from ``invoked utility exited with an error 1 indication.'' The value 127 was chosen because it is not commonly used 1 for other meanings; most utilities use small values for ``normal error conditions'' and the values above 128 can be confused with termination due to receipt of a signal. The value 126 was chosen in a similar manner 1 to indicate that the utility could be found, but not invoked. Some 1 scripts produce meaningful error messages differentiating the 126 and 127 1 cases. The distinction between exit codes 126 and 127 is based on 2 KornShell practice that uses 127 when all attempts to _e_x_e_c the utility 2 fail with [ENOENT], and uses 126 when any attempt to _e_x_e_c the utility 2 fails for any other reason. 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The -i option was added to allow the functionality of the - option in a manner compatible with the Utility Syntax Guidelines. It is the intent of the working group that portable applications use the -i option, and Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.21 env - Set environment for command invocation 501 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX that in the future the - option be removed from the standard. Historical implementations of the env utility use _e_x_e_c_v_p() or _e_x_e_c_l_p() (see POSIX.1 {8} 3.1.2) to invoke the specified utility; this provides better performance and keeps users from having to escape characters with special meaning to the shell. Therefore, shell functions, special built-ins, and built-ins that are only provided by the shell are not found. Implementations are free to invoke a shell instead of using one of the _e_x_e_c family of routines, but if they do, they must be sure to escape any characters with special meaning to the shell so that the user does not have to be aware of the difference. Some have suggested that env is redundant since the same effect is achieved by: name=value ... utility [argument ...] The example is equivalent to env when an environment variable is being added to the environment of the command, but not when the environment is being set to the given value. The env utility also writes out the current environment if invoked without arguments. There is sufficient functionality beyond what the example provides to justify inclusion of env. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 502 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.22 expr - Evaluate arguments as an expression 4.22.1 Synopsis expr _o_p_e_r_a_n_d ... 4.22.2 Description The expr utility shall evaluate an expression and write the result to standard output. 4.22.3 Options None. 4.22.4 Operands The single expression evaluated by expr shall be formed from the operands, as described in 4.22.7. Each of the expression operator symbols: ( ) | & = > >= < <= != + - * / % : and the symbols _i_n_t_e_g_e_r and _s_t_r_i_n_g in the table shall be provided by the application as separate arguments to expr. 4.22.5 External Influences 4.22.5.1 Standard Input None. 4.22.5.2 Input Files None. 4.22.5.3 Environment Variables The following environment variables shall affect the execution of expr: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.22 expr - Evaluate arguments as an expression 503 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions and by the string comparison operators. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes within regular expressions. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.22.5.4 Asynchronous Events Default. 4.22.6 External Effects 4.22.6.1 Standard Output The expr utility shall evaluate the expression and write the result to standard output. The character '0' shall be written to indicate a zero value and nothing shall be written to indicate a null string. 4.22.6.2 Standard Error Used only for diagnostic messages. 4.22.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 504 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.22.7 Extended Description The formation of the expression to be evaluated is shown in Table 4-7. The symbols _e_x_p_r, _e_x_p_r_1, and _e_x_p_r_2 represent expressions formed from _i_n_t_e_g_e_r and _s_t_r_i_n_g symbols and the expression operator symbols (all separate arguments) by recursive application of the constructs described in the table. The expressions in Table 4-7 are listed in order of increasing precedence, with equal-precedence operators grouped between horizontal lines. All of the operators shall be left-associative. Table 4-7 - expr Expressions _________________________________________________________________________ _____E_x_p_r_e_s_s_i_o_n____________________________D_e_s_c_r_i_p_t_i_o_n_____________________ _e_x_p_r_1 | _e_x_p_r_2 Returns the evaluation of _e_x_p_r_1 if it is neither null nor zero; otherwise, returns the evaluation of _e_x_p_r_2. _________________________________________________________________________ _e_x_p_r_1 & _e_x_p_r_2 Returns the evaluation of _e_x_p_r_1 if neither expression evaluates to null or zero; ___________________________o_t_h_e_r_w_i_s_e_,__r_e_t_u_r_n_s__z_e_r_o_._______________________ Returns the result of a decimal integer comparison if both arguments are integers; otherwise, returns the result of a string comparison using the locale-specific collation sequence. The result of each comparison shall be 1 if the specified relation is true, or 0 if the relation is false. _e_x_p_r_1 = _e_x_p_r_2 _E_q_u_a_l. | _e_x_p_r_1 > _e_x_p_r_2 | _G_r_e_a_t_e_r _t_h_a_n. | | _e_x_p_r_1 >= _e_x_p_r_2 | _G_r_e_a_t_e_r _t_h_a_n _o_r _e_q_u_a_l. | | _e_x_p_r_1 < _e_x_p_r_2 | _L_e_s_s _t_h_a_n. | | _e_x_p_r_1 <= _e_x_p_r_2 | _L_e_s_s _t_h_a_n _o_r _e_q_u_a_l. | | _e_x_p_r_1 != _e_x_p_r_2 | _N_o_t _e_q_u_a_l. | _|______________________|__________________________________________________| | _e_x_p_r_1 + _e_x_p_r_2 | Addition of decimal integer-valued | | | arguments. | | _e_x_p_r_1 - _e_x_p_r_2 | Subtraction of decimal integer-valued | _|______________________|____a_r_g_u_m_e_n_t_s_._____________________________________| | _e_x_p_r_1 * _e_x_p_r_2 | Multiplication of decimal integer-valued | | | arguments. | | _e_x_p_r_1 / _e_x_p_r_2 | Integer division of decimal integer-valued | | | arguments, producing an integer result. | | | Remainder of integer division of decimal | | | integer-valued arguments. | | | | | | | | Copyright| c 1991 IEEE. All rights reserved. | | This is an unappro|ved IEEE Standards Draft, subject to change. | | | | | | | | | | | | | | | | 4|.22 expr - Evaluate ar|guments as an expression 505| | | | | | | | | | | | | | | | P|1003.2/D11.2 | INFORMATION TECHNOLOGY--POSIX| | | | | _e_x_p_r_1 % _e_x_p_r_2 | | | | | _|______________________|__________________________________________________| _|____e__x__p__r__1_:____e__x__p__r__2_______|____M_a_t_c_h_i_n_g__e_x_p_r_e_s_s_i_o_n_.___S_e_e__4_._2_2_._7_._1_.____________| | ( _e_x_p_r ) | Grouping symbols. Any expression can be | | | placed within parentheses. Parentheses | | | can be nested to a depth of | | | {EXPR_NEST_MAX}. | _|______________________|__________________________________________________| | _i_n_t_e_g_e_r | An argument consisting only of an | | | (optional) unary minus followed by digits. | _||________s__t__r__i__n__g__________||____A__s_t_r_i_n_g__a_r_g_u_m_e_n_t_.___S_e_e__4_._2_2_._7_._2_.______________|| 4.22.7.1 Matching Expression The ':' matching operator shall compare the string resulting from the evaluation of _e_x_p_r_1 with the regular expression pattern resulting from the evaluation of _e_x_p_r_2. Regular expression syntax shall be that defined in 2.8.3 (Basic Regular Expressions), except that all patterns are ``anchored'' to the beginning of the string (that is, only sequences starting at the first character of a string shall be matched by the regular expression) and, therefore, it is unspecified whether ^ is a special character in that context. Usually, the matching operator shall return a string representing the number of characters matched ("0" on failure). Alternatively, if the pattern contains at least one regular expression subexpression [\(...\)], the string corresponding to \1 shall be returned (see 2.8.3.3). 4.22.7.2 String Operand A string argument is an argument that cannot be identified as an _i_n_t_e_g_e_r argument or as one of the expression operator symbols shown in 4.22.4. The use of string arguments length, substr, index, or match produces unspecified results. 4.22.8 Exit Status The expr utility shall exit with one of the following values: 0 If the _e_x_p_r_e_s_s_i_o_n evaluates to neither null nor zero. 1 If the _e_x_p_r_e_s_s_i_o_n evaluates to null or zero. 2 For invalid _e_x_p_r_e_s_s_i_o_ns. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 506 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 >2 An error occurred. 4.22.9 Consequences of Errors Default. BEGIN_RATIONALE 4.22.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The expr utility has a rather difficult syntax: - Many of the operators are also shell control operators or reserved words, so they have to be escaped on the command line. - Each part of the expression is composed of separate arguments, so liberal usage of s is required. For example: Invalid Valid ________________ _____________________ expr 1+2 expr 1 + 2 expr "1 + 2" expr 1 + 2 expr 1 + (2 * 3) expr 1 + \( 2 \* 3 \) In many cases, the arithmetic and string features provided as part of the shell command language are easier to use than their equivalents in expr; the utility was retained by POSIX.2 as acknowledgment of the many historical shell scripts that use it. Newly written scripts should avoid expr in favor of the new features within the shell. The following command _a=$(_e_x_p_r $_a + _1) adds 1 to the variable a. A new application should use 1 a=$(($a+1)) 1 The following command, for $a equal to either /usr/abc/file or just file: expr $a : '.*/\(.*\)' \| $a returns the last segment of a pathname (i.e., file). Applications should avoid the character / used alone as an argument: expr may interpret it as the division operator. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.22 expr - Evaluate arguments as an expression 507 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following command: expr "//$a" : '.*/\(.*\)' is a better representation of the previous example. The addition of the // characters eliminates any ambiguity about the division operator and simplifies the whole expression. Also note that pathnames may contain characters contained in the IFS variable and should be quoted to avoid having $a expand into multiple arguments. The following command expr "$VAR" : '.*' returns the number of characters in VAR. Usage Warning: After argument processing by the shell, expr is not required to be able to tell the difference between an operator and an operand except by the value. If $a is =, the command: expr $a = '=' looks like: expr = = = as the arguments are passed to expr (and they all may be taken as the = operator). The following works reliably: expr X$a = X= Also note that this standard permits implementations to extend utilities. The expr utility permits the integer arguments to be preceded with a unary minus. This means that an integer argument could look like an option. Therefore, the portable application must employ the "--" construct of Guideline 10 (see 2.10.2) to protect its operands if there is any chance the first operand might be a negative integer (or any string with a leading minus). _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e In an earlier draft, Extended Regular Expressions were used in the matching expression syntax. This was changed to the Basic variety to avoid breaking historical applications. The use of a leading circumflex in the regular expression is unspecified because many historical implementations have treated it as special, despite their system documentation. For example, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 508 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 expr foo : ^foo expr ^foo : ^foo return 3 and 0, respectively, on those systems; their documentation would imply the reverse. Thus, the anchoring condition is left unspecified to avoid breaking historical scripts relying on this undocumented feature. END_RATIONALE 4.23 false - Return false value 4.23.1 Synopsis false 4.23.2 Description The false utility shall return with a nonzero exit code. 4.23.3 Options None. 4.23.4 Operands None. 4.23.5 External Influences 4.23.5.1 Standard Input None. 4.23.5.2 Input Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.23 false - Return false value 509 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.23.5.3 Environment Variables None. 4.23.5.4 Asynchronous Events Default. 4.23.6 External Effects 4.23.6.1 Standard Output None. 4.23.6.2 Standard Error None. 4.23.6.3 Output Files None. 4.23.7 Extended Description None. 4.23.8 Exit Status The false utility always shall exit with a value other than zero. 4.23.9 Consequences of Errors Default. BEGIN_RATIONALE 4.23.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The false utility is typically used in shell control structures like while. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 510 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.24 find - Find files 4.24.1 Synopsis find _p_a_t_h ... [_o_p_e_r_a_n_d__e_x_p_r_e_s_s_i_o_n ...] 4.24.2 Description The find utility shall recursively descend the directory hierarchy from each file specified by _p_a_t_h, evaluating a Boolean expression composed of the primaries described in 4.24.4 for each file encountered. The find utility shall be able to descend to arbitrary depths in a file hierarchy and shall not fail due to path length limitations (unless a path operand specified by the application exceeds {PATH_MAX} requirements). The find utility requires that the underlying system provides information equivalent to the _s_t__d_e_v, _s_t__m_o_d_e, _s_t__n_l_i_n_k, _s_t__u_i_d, _s_t__g_i_d, _s_t__s_i_z_e, _s_t__a_t_i_m_e, _s_t__m_t_i_m_e, and _s_t__c_t_i_m_e members of _s_t_r_u_c_t _s_t_a_t described by POSIX.1 {8} 5.6 and conforming to the _f_i_l_e _t_i_m_e_s _u_p_d_a_t_e definition in 2.2.2.69. 4.24.3 Options None. 4.24.4 Operands The following operands shall be supported by the implementation: The _p_a_t_h operand is a pathname of a starting point in the directory hierarchy. The first argument that starts with a -, or is a ! or a (, and all subsequent arguments shall be interpreted as an _e_x_p_r_e_s_s_i_o_n made up of the following primaries and operators. In the descriptions, wherever _n is used as a primary argument, it shall be interpreted as a decimal integer optionally preceded by a plus (+) or minus (-) sign, as follows: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.24 find - Find files 511 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX +_n More than _n _n Exactly _n -_n Less than _n Implementations shall recognize the following primaries: _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e_s_e _p_r_i_m_a_r_i_e_s _h_a_v_e _b_e_e_n _s_o_r_t_e_d _a_l_p_h_a_b_e_t_i_c_a_l_l_y, _w_i_t_h_o_u_t _d_i_f_f _m_a_r_k_s. -atime _n The primary shall evaluate as true if the file access time subtracted from the initialization time is _n-1 to _n multiples of 24 hours. The initialization time shall be a time between the invocation of the find utility and the first access by that invocation of the find utility to any file specified by its _p_a_t_h operands. -ctime _n The primary shall evaluate as true if the time of last change of file status information subtracted from the initialization time is _n-1 to _n multiples of 24 hours. The initialization time shall be a time between the invocation of the find utility and the first access by that invocation of the find utility to any file specified by its _p_a_t_h operands. -depth The primary always shall evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated. -exec _u_t_i_l_i_t_y__n_a_m_e [_a_r_g_u_m_e_n_t ...] ; The primary shall evaluate as true if the invoked utility _u_t_i_l_i_t_y__n_a_m_e returns a zero value as exit status. The end of the primary expression shall be punctuated by a semicolon. A _u_t_i_l_i_t_y__n_a_m_e or _a_r_g_u_m_e_n_t containing only the two characters {} shall be replaced by the current pathname. If a utility_name or argument string contains the two characters {}, but not just the two characters {}, it is implementation defined whether find replaces those two characters with the current pathname or uses the string without change. The current directory for the invocation of _u_t_i_l_i_t_y__n_a_m_e shall be the same as the current directory when the find Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 512 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 utility was started. If the _u_t_i_l_i_t_y__n_a_m_e names any of the special built-in utilities in 3.14, the results are undefined. -group _g_n_a_m_e The primary shall evaluate as true if the file belongs to the group _g_n_a_m_e. If _g_n_a_m_e is a decimal integer and the _g_e_t_g_r_n_a_m() (or equivalent) function does not return a valid group name, _g_n_a_m_e shall be interpreted as a group ID. -links _n The primary shall evaluate as true if the file has _n links. -mtime _n The primary shall evaluate as true if the file modification time subtracted from the initialization time is _n-1 to _n multiples of 24 hours. The initialization time shall be a time between the invocation of the find utility and the first access by that invocation of the find utility to any file specified by its _p_a_t_h operands. -name _p_a_t_t_e_r_n The primary shall evaluate as true if the basename of the filename being examined matches _p_a_t_t_e_r_n using the pattern matching notation described in 3.13. -newer _f_i_l_e The primary shall evaluate as true if the modification time of the current file is more recent than the modification time of the file named by the pathname _f_i_l_e. -nogroup The primary shall evaluate as true if the file belongs to a group ID for which the POSIX.1 {8} _g_e_t_g_r_g_i_d() (or equivalent) function returns NULL. -nouser The primary shall evaluate as true if the file belongs to a user ID for which the POSIX.1 {8} _g_e_t_p_w_u_i_d() (or equivalent) function returns NULL. -ok _u_t_i_l_i_t_y__n_a_m_e [_a_r_g_u_m_e_n_t ...] ; The -ok primary shall be equivalent to -exec, except that find shall request affirmation of the invocation of _u_t_i_l_i_t_y__n_a_m_e using the current file as an argument by writing to standard error as, described in 4.24.6.2. If the response on standard input is affirmative, the utility shall be invoked. Otherwise, the command shall not be invoked and the value of the -ok operand shall be false. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.24 find - Find files 513 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -perm [-]_m_o_d_e The _m_o_d_e argument is used to represent file mode bits. It shall be identical in format to the _s_y_m_b_o_l_i_c__m_o_d_e operand described in 4.7, and shall be interpreted as follows. To start, a template shall be assumed with all file mode bits cleared. An _o_p symbol of + shall set the appropriate mode bits in the template; - shall clear the appropriate bits; = shall set the appropriate mode bits, without regard to the contents of process's file mode creation mask. The _o_p symbol of - cannot be the first character of _m_o_d_e. If the hyphen is omitted, the primary shall evaluate as true when the file permission bits exactly match the value of the resulting template. Otherwise, if _m_o_d_e is prefixed by a hyphen, the primary shall evaluate as true if at least all the bits in the resulting template are set in the file permission bits. -perm [-]_o_n_u_m (Obsolescent.) If the hyphen is omitted, the primary shall evaluate as true when the file permission bits exactly match the value of the octal number _o_n_u_m and only the bits corresponding to the octal mask 07777 shall be compared. (See the description of the octal _m_o_d_e in 4.7.) Otherwise, if _o_n_u_m is prefixed by a hyphen, the primary shall evaluate as true if at least all of the bits specified in _o_n_u_m that are also set it the octal mask 07777 are set. -print The primary always shall evaluate as true; it shall cause the current pathname to be written to standard output. -prune The primary always shall evaluate as true; it shall cause find not to descend the current pathname if it is a directory. If the -depth primary is specified, the -prune primary shall have no effect. -size _n[c] The primary shall evaluate as true if the file size in bytes, divided by 512 and rounded up to the next integer, is _n. If _n is followed by the character c, the size shall be in bytes. -type _c The primary shall evaluate as true if the type of the file is _c, where _c is b, c, d, p, or f for block special file, character special file, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 514 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 directory, FIFO, or regular file, respectively. -user _u_n_a_m_e The primary shall evaluate as true if the file belongs to the user _u_n_a_m_e. If _u_n_a_m_e is a decimal integer and the _g_e_t_p_w_n_a_m() (or equivalent) function does not return a valid user name, _u_n_a_m_e shall be interpreted as a user ID. -xdev The primary always shall evaluate as true; it shall cause find not to continue descending past directories that have a different device ID (_s_t__d_e_v, see POSIX.1 {8} 5.6.2). If any -xdev primary is specified, it shall apply to the entire expression even if the -xdev primary would not normally be evaluated. The primaries can be combined using the following operators (in order of decreasing precedence): ( _e_x_p_r_e_s_s_i_o_n ) True if _e_x_p_r_e_s_s_i_o_n is true. ! _e_x_p_r_e_s_s_i_o_n Negation of a primary; the unary NOT operator. _e_x_p_r_e_s_s_i_o_n [-a] _e_x_p_r_e_s_s_i_o_n Conjunction of primaries; the AND operator shall be implied by the juxtaposition of two primaries or made explicit by the optional -a operator. The second expression shall not be evaluated if the first expression is false. _e_x_p_r_e_s_s_i_o_n -_o _e_x_p_r_e_s_s_i_o_n Alternation of primaries; the OR operator. The second expression shall not be evaluated if the first expression is true. If no _e_x_p_r_e_s_s_i_o_n is present, -print shall be used as the expression. Otherwise, if the given expression does not contain any of the primaries -exec, -ok, or -print, the given expression shall be effectively replaced by: ( _g_i_v_e_n__e_x_p_r_e_s_s_i_o_n ) -print The -user, -group, and -newer primaries each shall evaluate their respective arguments only once. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.24 find - Find files 515 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.24.5 External Influences 4.24.5.1 Standard Input If the -ok primary is used, the response shall be read from the standard input. An entire line shall be read as the response. Otherwise, the standard input shall not be used. 4.24.5.2 Input Files None. 4.24.5.3 Environment Variables The following environment variables shall affect the execution of find: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements used in the pattern matching notation for the -name option and in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments), the behavior of character classes within the pattern matching notation used for the -name option, and the behavior of character classes within regular expressions used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_MESSAGES This variable shall determine the processing of affirmative responses and the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 516 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 PATH This variable shall determine the location of the _u_t_i_l_i_t_y__n_a_m_e for the -exec and -ok primaries, as described in 2.6. 4.24.5.4 Asynchronous Events Default. 4.24.6 External Effects 4.24.6.1 Standard Output The -print primary shall cause the current pathnames to be written to standard output. The format shall be: "%s\n", <_p_a_t_h> 4.24.6.2 Standard Error The -ok primary shall write a prompt to standard error containing at least the utility_name to be invoked and the current pathname. In the POSIX Locale, the last non- character in the prompt shall be ?. The exact format used is unspecified. Otherwise, the standard error shall be used only for diagnostic messages. 4.24.6.3 Output Files None. 4.24.7 Extended Description None. 4.24.8 Exit Status The find utility shall exit with one of the following values: 0 All _p_a_t_h operands were traversed successfully. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.24 find - Find files 517 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.24.9 Consequences of Errors Default. BEGIN_RATIONALE 4.24.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e When used in operands, pattern matching notation, semicolons, opening parentheses, and closing parentheses are special to the shell and must be quoted (see 3.2). The following command: find / \( -name tmp -o -name '*.xx' \) \ -atime +7 -exec rm {} \; removes all files named tmp or ending in .xx that have not been accessed for seven or more 24-hour periods. The following command: find . -perm -o+w,+s prints (-print is assumed) the names of all files in or below the current directory, with all of the file permission bits S_ISUID, S_ISGID, and S_IWOTH set. The -prune primary was adopted from later releases of 4.3BSD and the 1 third edition of the _S_V_I_D. The following command recursively prints 1 pathnames of all files in the current directory and below, but skips directories named SCCS and files in them. find . -name SCCS -prune -o -print The following command behaves as in the previous example, but prints the names of the SCCS directories. find . -print -name SCCS -prune The following command is roughly equivalent to the -nt extension to test: 1 if [ -n "$(find file1 -prune -newer file2)" ]; then 2 printf %s\\n "file1 is newer than file2" 2 fi 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 518 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The historical -a operator is kept as an optional operator for compatibility with existing shell scripts even though it is redundant with expression concatenation. The symbolic means of specifying file permission bits, based on chmod, was added in response to numerous balloting objections that find was the only remaining utility to not support this method. The warning about a leading _O_p of - is to avoid ambiguity with the optional leading hyphen. Since the initial mode is all bits off, there are not any symbolic modes that need to use - as the first character. The bit that is traditionally used for sticky (historically 01000) is still specified in the -perm primary using the octal number argument form. Since this bit is not defined by POSIX.1 {8} or POSIX.2, applications must not assume that it actually refers to the traditional sticky bit. The descriptions of how the - modifier on the _m_o_d_e and _o_n_u_m arguments to the -perm primary affects processing has been documented here to match the way it behaves in practice on historical BSD and System V implementations. System V and BSD documentation both describe it in terms of checking additional bits; in fact, it uses the same bits, but checks for having at least all of the matching bits set instead of having exactly the matching bits set. The exact format of the interactive prompts is unspecified. Only the general nature of the contents of prompts are specified, because: (1) Implementations may desire more descriptive prompts than those used on historical implementations. (2) Since the traditional prompt strings do not terminate with s, there is no portable way for another program to interact with the prompts of this utility via pipes. Therefore, an application using this prompting option relies on the system to provide the most suitable dialogue directly with the user, based on the general guidelines specified. The -name _f_i_l_e operand was changed to use the shell pattern matching notation so that find is consistent with other utilities using pattern matching. For the -type _c operand, implementors of symbolic links should consider l (the letter ell) for symbolic links. Implementations that support sockets also use -type s for sockets. Implementations planning to add options to allow find to follow symbolic links or treat them as special files, should consider using -follow as used in BSD and System V Release 4 as a guide. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.24 find - Find files 519 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The -size operand refers to the size of a file, rather than the number of 2 blocks it may occupy in the file system. The intent is that the 2 POSIX.1 {8} _s_t__s_i_z_e field should be used, not the _s_t__b_l_o_c_k_s found in 2 historical implementations. There are at least two reasons for this: 2 - In both System V and BSD, find only uses _s_t__s_i_z_e in size 2 calculations for the operands specified by POSIX.2. (BSD uses 2 _s_t__b_l_o_c_k_s only when processing the -ls primary.) 2 - Users will usually be thinking of size in terms of the size of the 2 file in bytes, which is also used by the ls utility for the output 2 from the -l option. (In both System V in BSD, ls uses _s_t__s_i_z_e for 2 the -l option size field and uses _s_t__b_l_o_c_k_s for the ls -s 2 calculations. POSIX.2 does not specify ls -s.) 2 The descriptions of -atime, -ctime, and -mtime were changed from the _S_V_I_D's description of _n ``days'' to ``24-hour periods.'' For example, a file accessed at 23:59 will be selected by find . -atime -1 -print at 00:01 the next day (less than 24 hours later, not more than one day ago); the midnight boundary between days has no effect on the 24-hour calculation. The description is also different in terms of the exact 1 timeframe for the _n case (versus the +_n or -_n), but it matches all known 1 historical implementations. It refers to one 24-hour period in the past, 1 not any time from the beginning of that period to the current time. For 1 example, -atime 3 is true if the file was accessed any time in the period 1 from 72 to 48 hours ago. 1 Historical implementations do not modify {} when it appears as a substring of an -exec or -ok _u_t_i_l_i_t_y__n_a_m_e or argument string. There have been numerous user requests for this extension, so this standard allows the desired behavior. At least one recent implementation does support this feature, but ran into several problems in managing memory allocation and dealing with multiple occurrences of {} in a string while it was being developed, so it is not yet required behavior. Assuming the presence of -print was added at the request of several working group members to correct a historical pitfall that plagues novice users. It is entirely upward compatible from the historical System V find utility and should be easy to implement. In its simplest form (find _d_i_r_e_c_t_o_r_y), it could be confused with the historical BSD fast find. The BSD developers agree that adding -print as a default expression is the right thing to do and believe that the fast find functionality should have been/should be provided by a separate utility. They suggest that the new utility be called locate. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 520 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.25 fold - Fold lines 4.25.1 Synopsis fold [-bs] [-w _w_i_d_t_h] [_f_i_l_e ...] 4.25.2 Description The fold utility is a filter that shall fold lines from its input files, breaking the lines to have a maximum of _w_i_d_t_h column positions (or bytes, if the -b option is specified). Lines shall be broken by the insertion of a character such that each output line (referred to later in this clause as a segment) is the maximum width possible that does not exceed the specified number of column positions (or bytes). A line shall not be broken in the middle of a character. The behavior is undefined if _w_i_d_t_h is less than the number of columns any single character in the input would occupy. If the , , or characters are 2 encountered in the input, and the -b option is not specified, they shall be treated specially: 2 The current count of line width shall be set to zero. The fold 2 utility shall not insert a immediately before or 2 after any . 2 The current count of line width shall be decremented by one, although the count never shall become negative. The fold utility shall not insert a immediately before or after any . Each character encountered shall advance the column position pointer to the next tab stop. Tab stops shall be at each column position _n such that _n modulo 8 equals 1. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.25 fold - Fold lines 521 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.25.3 Options The fold utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -b Count _w_i_d_t_h in bytes rather than column positions. -s If a segment of a line contains a within the first _w_i_d_t_h column positions (or bytes), break the line after the last such meeting the width constraints. If there is no meeting the requirements, the -s option shall have no effect for that output segment of the input line. -w _w_i_d_t_h Specify the maximum line length, in column positions (or bytes if -b is specified). The results are unspecified if _w_i_d_t_h is not a positive decimal number. The default value shall be 80. 4.25.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of a text file to be folded. If no _f_i_l_e operands are specified, the standard input shall be used. 4.25.5 External Influences 4.25.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.25.5.2 Input Files If the -b option is specified, the input files shall be text files except that the lines are not limited to {LINE_MAX} bytes in length. If the -b option is not specified, the input files shall be text files. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 522 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.25.5.3 Environment Variables The following environment variables shall affect the execution of fold: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and for the determination of the width in column positions each character would occupy on a constant-width- font output device. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.25.5.4 Asynchronous Events Default. 4.25.6 External Effects 4.25.6.1 Standard Output The standard output shall be a file containing a sequence of characters whose order shall be preserved from the input file(s), possibly with inserted characters. 4.25.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.25 fold - Fold lines 523 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.25.6.3 Output Files None. 4.25.7 Extended Description None. 4.25.8 Exit Status The fold utility shall exit with one of the following values: 0 All input files were processed successfully. >0 An error occurred. 4.25.9 Consequences of Errors Default. BEGIN_RATIONALE 4.25.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The cut and fold utilities can be used to create text files out of files with arbitrary line lengths. The cut utility should be used when the number of lines (or records) needs to remain constant. The fold utility should be used when the contents of long lines needs to be kept contiguous. The fold utility is frequently used to send text files to line printers that truncate, rather than fold, lines wider than the printer is able to print (usually 80 or 132 column positions.) Although terminal input in canonical processing mode requires the erase character (frequently set to ) to erase the previous character (not byte or column position), terminal output is not buffered and is extremely difficult, if not impossible, to parse correctly; the interpretation depends entirely on the physical device that will actually display/print/store the output. In all known internationalized implementations, the utilities producing output for mixed column width output assume that a backs up one column position and outputs enough s to get back to the start of the character when Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 524 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 is used to provide local line motions to support underlining and emboldening operations. Since fold without the -b option is dealing with these same constraints, is always treated as backing up one column position rather than backing up one character. An example invocation that submits a file of possibly long lines to the line printer (under the assumption that the user knows the line width of the printer to be assigned by lp): fold -w 132 bigfile | lp _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Historical versions of the fold utility assumed one byte was one character and occupied one column position when written out. This is no longer always true. Since the most common usage of fold is believed to be folding long lines for output to limited-length output devices, this capability was preserved as the default case. The -b option was added so that applications could fold files with arbitrary length lines into text files that could then be processed by the utilities in this standard. Note that although the width for the -b option is in bytes, a line will never be split in the middle of a character. (It is unspecified what happens if a width is specified that is too small to hold a single character found in the input followed by a .) The use of a hyphen as an option to specify standard input was removed from an earlier draft because it adds no functionality and is not historical practice. The tab stops are hardcoded to be every eighth column to meet historical practice. No new method of specifying other tab stops was invented. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.25 fold - Fold lines 525 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.26 getconf - Get configuration values 4.26.1 Synopsis getconf _s_y_s_t_e_m__v_a_r getconf _p_a_t_h__v_a_r _p_a_t_h_n_a_m_e 4.26.2 Description In the first synopsis form, the getconf utility shall write to the standard output the value of the variable specified by the _s_y_s_t_e_m__v_a_r operand. In the second synopsis form, the getconf utility shall write to the standard output the value of the variable specified by the _p_a_t_h__v_a_r operand for the path specified by the _p_a_t_h_n_a_m_e operand. The value of each configuration variable shall be determined as if it were obtained by calling the function from which it is defined to be available by this standard or by POSIX.1 {8} (see Operands). The value shall reflect conditions in the current operating environment. 4.26.3 Options None. 4.26.4 Operands The following operands shall be supported by the implementation: _s_y_s_t_e_m__v_a_r A name of a configuration variable whose value is available from the function defined in 7.8.1 [such as _c_o_n_f_s_t_r() in the C binding], from the POSIX.1 {8} _s_y_s_c_o_n_f() function, one of the additional POSIX.2 variables described in 7.8.2, to be available from the _s_y_s_c_o_n_f() function, or a minimum value specified by POSIX.1 {8} or POSIX.2 for one of these variables. The configuration variables and minimum values listed in the: - Name column of Table 2-16 (Utility Limit Minimum Values) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 526 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 - Name column of Table 2-17 (Symbolic Utility Limits) - Name column of Table 2-18 (Optional Facility Configuration Values) - Name column of POSIX.1 {8} Table 2-3 (Minimum Values) - Name column of POSIX.1 {8} Table 2-4 (Run-Time Increasable Values) - Variable column of POSIX.1 {8} Table 4-2 (Configurable System Variables; except CLK_TCK need not be supported), without the enclosing braces and PATH [corresponding to the _c_o_n_f_s_t_r() name value _CS_PATH] shall be recognized as valid _s_y_s_t_e_m__v_a_r operands. The implementation may support additional _s_y_s_t_e_m__v_a_r operand values. _p_a_t_h__v_a_r A name of a configuration variable whose value is available from the POSIX.1 {8} _p_a_t_h_c_o_n_f() function. The configuration variables listed in the Variable column of the POSIX.1 {8} Table 5-2 (Configurable Pathname Variables), without the enclosing braces, shall be recognized as valid _p_a_t_h__v_a_r operands. The implementation may support additional _p_a_t_h__v_a_r operand values. _p_a_t_h_n_a_m_e A pathname for which the variable specified by _p_a_t_h__v_a_r is to be determined. 4.26.5 External Influences 4.26.5.1 Standard Input None. 4.26.5.2 Input Files None. 4.26.5.3 Environment Variables The following environment variables shall affect the execution of getconf: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.26 getconf - Get configuration values 527 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.26.5.4 Asynchronous Events Default. 4.26.6 External Effects 4.26.6.1 Standard Output If the specified variable is defined on the system and its value is described to be available from the function in 7.8.1, its value shall be written in the following format: "%s\n", <_v_a_l_u_e> Otherwise, if the specified variable is defined on the system, its value shall be written in the following format: "%d\n", <_v_a_l_u_e> If the specified variable is valid, but is undefined on the system, getconf shall write using the following format: "undefined\n" If the variable name is invalid or an error occurs, nothing shall be written to standard output. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 528 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.26.6.2 Standard Error Used only for diagnostic messages. 4.26.6.3 Output Files None. 4.26.7 Extended Description None. 4.26.8 Exit Status The getconf utility shall exit with one of the following values: 0 The specified variable is valid and information about its current state was written successfully. >0 An error occurred. 4.26.9 Consequences of Errors Default. BEGIN_RATIONALE 4.26.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The original need for this utility, and for the _c_o_n_f_s_t_r() function, was to provide a way of finding the configuration-defined default value for the PATH environment variable. Since PATH can be modified by the user to include directories that could contain utilities replacing the POSIX.2 standard utilities, shell scripts need a way to determine the system supplied PATH environment variable value that contains the correct search path for the standard utilities. It was later suggested that access to the other variables described here could also be useful to applications. This example illustrates the value of {NGROUPS_MAX}: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.26 getconf - Get configuration values 529 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX getconf NGROUPS_MAX This example illustrates the value of {NAME_MAX} for a specific directory: getconf NAME_MAX /usr This example shows how to deal more carefully with results that might be unspecified: if value=$(getconf PATH_MAX /usr); then 1 if [ "$value" = "undefined" ]; then echo PATH_MAX in /usr is infinite. else echo PATH_MAX in /usr is $value. fi else echo Error in getconf. fi Note that: sysconf(_SC_POSIX_C_BIND); and: system("getconf POSIX2_C_BIND"); in a C program could give different answers. The _s_y_s_c_o_n_f() call supplies a value that corresponds to the conditions when the program was either compiled or executed, depending on the implementation; the _s_y_s_t_e_m() call to getconf always supplies a value corresponding to conditions when the program is executed. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This utility was renamed from posixconf during balloting because the new name expresses its purpose more specifically, and does not unduly restrict the scope of application of the utility. This functionality of this utility would not be adequately subsumed by another command such as grep _v_a_r /etc/conf because such a strategy would provide correct values for neither those variables that can vary at run-time, nor those that can vary depending on the path. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 530 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Previous versions of this utility specified exit status 1 when the specified variable was valid, but not defined on the system. The output string "undefined" is now used to specify this case with exit code 0 because so many things depend on an exit code of zero when an invoked utility is successful. END_RATIONALE 4.27 getopts - Parse utility options 4.27.1 Synopsis getopts _o_p_t_s_t_r_i_n_g _n_a_m_e [_a_r_g ...] 4.27.2 Description The getopts utility can be used to retrieve options and option-arguments from a list of parameters. It shall support the utility argument syntax guidelines 3 through 10, inclusive, described in 2.10.2. Each time it is invoked, the getopts utility shall place the value of the next option in the shell variable specified by the _n_a_m_e operand and the index of the next argument to be processed in the shell variable OPTIND. Whenever the shell is invoked, OPTIND shall be initialized to 1. When the option requires an option-argument, the getopts utility shall place it in the shell variable OPTARG. If no option was found, or if the option that was found does not have an option-argument, OPTARG shall be 1 unset. 1 If an option character not contained in the _o_p_t_s_t_r_i_n_g operand is found where an option character is expected, the shell variable specified by _n_a_m_e shall be set to the question-mark (?) character. In this case, if the first character in _o_p_t_s_t_r_i_n_g is a colon (:), the shell variable OPTARG shall be set to the option character found, but no output shall be written to standard error; otherwise, the shell variable OPTARG shall be unset and a diagnostic message shall be written to standard error. This condition shall be considered to be an error detected in the way arguments were presented to the invoking application, but shall not be an error in getopts processing. If an option-argument is missing: - If the first character of _o_p_t_s_t_r_i_n_g is a colon, the shell variable specified by _n_a_m_e shall be set to the colon character and the shell Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.27 getopts - Parse utility options 531 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX variable OPTARG shall be set to the option character found. - Otherwise, the shell variable specified by _n_a_m_e shall be set to the question-mark character, the shell variable OPTARG shall be unset, and a diagnostic message shall be written to standard error. This condition shall be considered to be an error detected in the way arguments were presented to the invoking application, but shall not be an error in getopts processing; a diagnostic message shall be written as stated, but the exit status shall be zero. When the end of options is encountered, the getopts utility shall exit with a return value greater than zero; the shell variable OPTIND shall be set to the index of the first nonoption-argument, where the first -- argument is considered to be an option-argument if there are no other nonoption-arguments appearing before it, or the value $# + 1 if there are no nonoption-arguments; the _n_a_m_e variable shall be set to the question- mark character. Any of the following shall identify the end of options: the special option --, finding an argument that does not begin with a -, or encountering an error. The shell variables OPTIND and OPTARG shall be local to the caller of getopts and shall not be exported by default. The shell variable specified by the _n_a_m_e operand, OPTIND, and OPTARG shall affect the current shell execution environment; see 3.12. If the application sets OPTIND to the value 1, a new set of parameters 1 can be used: either the current positional parameters or new _a_r_g values. 1 Any other attempt to invoke getopts multiple times in a single shell 1 execution environment with parameters (positional parameters or _a_r_g 1 operands) that are not the same in all invocations, or with an OPTIND 1 value modified to be a value other than 1, produces unspecified results. 1 4.27.3 Options None. 4.27.4 Operands The following operands shall be supported by the implementation: _o_p_t_s_t_r_i_n_g A string containing the option characters recognized by the utility invoking getopts. If a character is followed by a colon, the option shall be expected to have an argument, which should be supplied as a separate argument. Applications should specify an option character and its option-argument as separate arguments, but getopts shall Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 532 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 interpret the characters following an option character requiring arguments as an argument whether or not this is done. An explicit null option-argument need not be recognized if it is not supplied as a separate argument when getopts is invoked. [See also the _g_e_t_o_p_t() Description in B.7]. The characters question-mark and colon shall not be used as option characters by an application. The use of other option characters that are 2 not alphanumeric produces unspecified results. If the 2 option-argument is not supplied as a separate argument from the option character, the value in OPTARG shall be stripped of the option character and the '-'. The first character in _o_p_t_s_t_r_i_n_g shall determine how getopts shall behave if an option character is not known or an option- argument is missing. See 4.27.2. _n_a_m_e The name of a shell variable that shall be set by the getopts utility to the option character that was found. See 4.27.2. The getopts utility by default shall parse positional parameters passed to the invoking shell procedure. If _a_r_gs are given, they shall be parsed instead of the positional parameters. 4.27.5 External Influences 4.27.5.1 Standard Input None. 4.27.5.2 Input Files None. 4.27.5.3 Environment Variables The following environment variables shall affect the execution of getopts: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.27 getopts - Parse utility options 533 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. OPTIND This variable shall be used by the getopts utility as the index of the next argument to be processed. 4.27.5.4 Asynchronous Events Default. 4.27.6 External Effects 4.27.6.1 Standard Output None. 4.27.6.2 Standard Error Whenever an error is detected and the first character in the _o_p_t_s_t_r_i_n_g operand is not a colon (:), a diagnostic message shall be written to standard error with the following information in an unspecified format: 1 - The invoking program name shall be identified in the message. The 1 invoking program name shall be the value of the shell special 1 parameter 0 (see 3.5.2) at the time the getopts utility is invoked. 1 A name equivalent to 1 basename "$0" 1 may be used. 1 - If an option is found that was not specified in _o_p_t_s_t_r_i_n_g, this 1 error shall be identified and the invalid option character shall be 1 identified in the message. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 534 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 - If an option requiring an option-argument is found, but an option- 1 argument is not found, this error shall be identified and the 1 invalid option character shall be identified in the message. 1 4.27.6.3 Output Files None. 4.27.7 Extended Description None. 4.27.8 Exit Status The getopts utility shall exit with one of the following values: 0 An option, specified or unspecified by _o_p_t_s_t_r_i_n_g, was found. >0 The end of options was encountered or an error occurred. 4.27.9 Consequences of Errors Default. BEGIN_RATIONALE 4.27.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The getopts utility was chosen in preference to the getopt utility specified in System V because getopts handles option-arguments containing characters. Since getopts affects the current shell execution environment, it is generally provided as a shell regular built-in. If it is called in a 1 subshell or separate utility execution environment, such as one of the 1 following: 1 (getopts abc value "$@") 1 nohup getopts ... 1 find . -exec getopts ... \; 1 it will not affect the shell variables in the caller's environment. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.27 getopts - Parse utility options 535 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Note that shell functions share OPTIND with the calling shell even though the positional parameters are changed. Functions that want to use getopts to parse their arguments will usually want to save the value of OPTIND on entry and restore it before returning. However, there will be cases when a function will want to change OPTIND for the calling shell. The following example script parses and displays its arguments: aflag= bflag= while getopts ab: name do case $name in a) aflag=1;; b) bflag=1 bval="$OPTARG";; ?) printf "Usage: %s: [-a] [-b value] args\n" $0 1 exit 2;; esac done if [ ! -z "$aflag" ]; then 1 printf "Option -a specified\n" 1 fi 1 if [ ! -z "$bflag" ]; then 1 printf 'Option -b "%s" specified\n' "$bval" 1 fi 1 shift $(($OPTIND - 1)) 1 printf "Remaining arguments are: %s\n" "$*" 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The OPTARG variable is not mentioned in the Environment Variables subclause because it does not affect the execution of getopts; it is one of the few ``output-only'' variables used by the standard utilities. Use of colon (:) as an option character (in a previous draft) was new behavior and violated the syntax guidelines. Many objectors felt that it did not add enough to getopts to warrant mandating the extension to existing practice. The colon is now specified to behave as in the KornShell version of the getopts utility; when used as the first character in the _o_p_t_s_t_r_i_n_g operand, it disables diagnostics concerning missing option-arguments and unexpected option characters. This replaces the use of the OPTERR variable that was specified in an earlier draft. The formats of the diagnostic messages produced by the getopts utility 1 and the _g_e_t_o_p_t() function are not fully specified because implementations 1 with superior (``friendlier'') formats objected to the formats used by 1 some historical implementations. It was felt to be important that the 1 information in the messages used be uniform between getopts and _g_e_t_o_p_t(). 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 536 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Exact duplication of the messages might not be possible, particularly if 1 a utility is built on another system that has a different _g_e_t_o_p_t() 1 function, but the messages must have specific information included so 1 that the program name, invalid option character, and type of error can be 1 distinguished by a user. 1 Only a rare application program will intercept a getopts standard error 1 message and want to parse it. Therefore, implementations are free to 1 choose the most usable messages they can devise. The following formats 1 are used by many historical implementations: 1 "%s: illegal option -- %c\n", <_p_r_o_g_r_a_m _n_a_m_e>, 1 <_o_p_t_i_o_n _c_h_a_r_a_c_t_e_r> 1 "%s: option requires an argument -- %c\n", <_p_r_o_g_r_a_m _n_a_m_e>, 1 <_o_p_t_i_o_n _c_h_a_r_a_c_t_e_r> 1 Historical shells with built-in versions of _g_e_t_o_p_t() or getopts have used different formats, frequently not even indicating the option character found in error. END_RATIONALE 4.28 grep - File pattern searcher 4.28.1 Synopsis grep [ -E | -F ] [ -c | -l | -q ] [-insvx] -e _p_a_t_t_e_r_n__l_i_s_t ... [-f _p_a_t_t_e_r_n__f_i_l_e] ... [_f_i_l_e ...] grep [ -E | -F ] [ -c | -l | -q ] [-insvx] [-e _p_a_t_t_e_r_n__l_i_s_t] ... -f _p_a_t_t_e_r_n__f_i_l_e ... [_f_i_l_e ...] grep [ -E | -F ] [ -c | -l | -q ] [-insvx] _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n_s: egrep [ -c | -l ] [-inv] -e _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] egrep [ -c | -l ] [-inv] -f _p_a_t_t_e_r_n__f_i_l_e [_f_i_l_e ...] egrep [ -c | -l ] [-inv] _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] fgrep [ -c | -l ] [-invx] -e _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] fgrep [ -c | -l ] [-invx] -f _p_a_t_t_e_r_n__f_i_l_e [_f_i_l_e ...] fgrep [ -c | -l ] [-invx] _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.28 grep - File pattern searcher 537 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.28.2 Description The grep utility shall search the input files, selecting lines matching one or more patterns; the types of patterns shall be controlled by the options specified. The patterns are specified by the -e option, -f option, or the _p_a_t_t_e_r_n__l_i_s_t operand. The _p_a_t_t_e_r_n__l_i_s_t's value shall consist of one or more patterns separated by s; the _p_a_t_t_e_r_n__f_i_l_e's contents shall consist of one or more patterns terminated by s. By default, an input line shall be selected if any pattern, treated as an entire basic regular expression (BRE) as described in 2.8.3, matches any part of the line; a null BRE shall match every line. By default, each selected input line shall be written to the standard output. Regular expression matching shall be based on text lines. Since separates or terminates patterns (see the -e and -f options below), regular expressions cannot contain a character. Similarly, since patterns are matched against individual lines of the input, there is no way for a pattern to match a found in the input. A command invoking the (obsolescent) egrep utility with the -e option specified shall be equivalent to the command: grep -E [ -c | -l ] [-inv] -e _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] A command invoking the egrep utility with the -f option specified shall be equivalent to the command: grep -E [ -c | -l ] [-inv] -f _p_a_t_t_e_r_n__f_i_l_e [_f_i_l_e ...] A command invoking the egrep utility with the _p_a_t_t_e_r_n__l_i_s_t specified shall be equivalent to the command: grep -E [ -c | -l ] [-inv] _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] A command invoking the (obsolescent) fgrep utility with the -e option specified shall be equivalent to the command: grep -F [ -c | -l ] [-invx] -e _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] A command invoking the fgrep utility with the -f option specified shall be equivalent to the command: grep -F [ -c | -l ] [-invx] -f _p_a_t_t_e_r_n__f_i_l_e [_f_i_l_e ...] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 538 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 A command invoking the fgrep utility with the _p_a_t_t_e_r_n__l_i_s_t operand specified shall be equivalent to the command: grep -F [ -c | -l ] [-invx] _p_a_t_t_e_r_n__l_i_s_t [_f_i_l_e ...] 4.28.3 Options The grep utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -E Match using extended regular expressions. Treat each pattern specified as an ERE, as described in 2.8.4. If any entire ERE pattern matches an input line, the line shall be matched. A null ERE shall match every line. -F Match using fixed strings. Treat each pattern specified as a string instead of a regular expression. If an input line contains any of the patterns as a contiguous sequence of bytes, the line shall be matched. A null string shall match every line. -c Write only a count of selected lines to standard output. -e _p_a_t_t_e_r_n__l_i_s_t Specify one or more patterns to be used during the search for input. Patterns in _p_a_t_t_e_r_n__l_i_s_t shall be separated by a . A null pattern can be specified by two adjacent s in _p_a_t_t_e_r_n__l_i_s_t; in the obsolescent forms, adjacent s in _p_a_t_t_e_r_n__l_i_s_t produce undefined results. Unless the -E or -F option is also specified, each pattern shall be treated as a BRE, as described in 2.8.3. In the nonobsolescent forms, multiple -e and -f options shall be accepted by the grep utility. All of the specified patterns shall be used when matching lines, but the order of evaluation is unspecified. -f _p_a_t_t_e_r_n__f_i_l_e Read one or more patterns from the file named by the pathname _p_a_t_t_e_r_n__f_i_l_e. Patterns in _p_a_t_t_e_r_n__f_i_l_e shall be terminated by a . A null pattern can be specified by an empty line in _p_a_t_t_e_r_n__f_i_l_e. Unless the -E or -F option is also specified, each pattern shall be treated as a BRE, as described in 2.8.3. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.28 grep - File pattern searcher 539 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -i Perform pattern matching in searches without regard to case. See 2.8.2. -l (The letter ell.) Write only the names of files containing selected lines to standard output. Pathnames shall be written once per file searched. If the standard input is searched, a pathname of "(standard input)" shall be written, in the POSIX Locale. In other locales, standard input may be replaced by something more appropriate in those locales. -n Precede each output line by its relative line number in the file, each file starting at line 1. The line number counter shall be reset for each file processed. -q Quiet. Do not write anything to the standard output, regardless of matching lines. Exit with zero status if an input line is selected. -s Suppress the error messages ordinarily written for nonexistent or unreadable files. Other error messages shall not be suppressed. -v Select lines not matching any of the specified patterns. If the -v option is not specified, selected lines shall be those that match any of the specified patterns. -x Consider only input lines that use all characters in the line to match an entire fixed string or regular expression to be matching lines. 4.28.4 Operands The following operands shall be supported by the implementation: _p_a_t_t_e_r_n Specify one or more patterns to be used during the search for input. This operand shall be treated as if it were specified as -e _p_a_t_t_e_r_n__l_i_s_t (see 4.28.3). _f_i_l_e A pathname of a file to be searched for the pattern(s). If no _f_i_l_e operands are specified, the standard input shall be used. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 540 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.28.5 External Influences 4.28.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.28.5.2 Input Files The input files shall be text files. 4.28.5.3 Environment Variables The following environment variables shall affect the execution of grep: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes within regular expressions. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.28.5.4 Asynchronous Events Default. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.28 grep - File pattern searcher 541 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.28.6 External Effects 4.28.6.1 Standard Output If the -l option is in effect, and the -q option is not, a single output line shall be written for each file containing at least one selected input line: "%s\n", _f_i_l_e Otherwise, if more than one _f_i_l_e argument appears, and -q is not specified, the grep utility shall prefix each output line by: "%s:", _f_i_l_e The remainder of each output line shall depend on the other options specified: - If the -c option is in effect, the remainder of each output line shall contain: "%d\n", <_c_o_u_n_t> - Otherwise, if -c is not in effect and the -n option is in effect, the following shall be written to standard output: "%d:", <_l_i_n_e _n_u_m_b_e_r> - Finally, the following shall be written to standard output: "%s", <_s_e_l_e_c_t_e_d-_l_i_n_e _c_o_n_t_e_n_t_s> 4.28.6.2 Standard Error Used only for diagnostic messages. 4.28.6.3 Output Files None. 4.28.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 542 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.28.8 Exit Status The grep utility shall exit with one of the following values: 0 One or more lines were selected. 1 No lines were selected. >1 An error occurred. 4.28.9 Consequences of Errors If the -q option is specified, the exit status shall be zero if an input line is selected, even if an error was detected. Otherwise, default actions shall be performed. BEGIN_RATIONALE 4.28.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This grep has been enhanced in an upward-compatible way to provide the exact functionality of the historical egrep and fgrep commands as well. It was the clear intention of the working group to consolidate the three greps into a single command. The old egrep and fgrep commands are likely to be supported for many 1 years to come as implementation extensions, allowing existing applications to operate unmodified. To find all uses of the word Posix (in any case) in the file text.mm, and write with line numbers: grep -i -n posix text.mm To find all empty lines in the standard input: 2 grep ^$ or grep -v . Both of the following commands print all lines containing strings abc or def or both: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.28 grep - File pattern searcher 543 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX grep -E 'abc def' grep -F 'abc def' Both of the following commands print all lines matching exactly abc or def: grep -E '^abc$ ^def$' grep -F -x 'abc def' _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The -e _p_a_t_t_e_r_n__l_i_s_t option has the same effect as the _p_a_t_t_e_r_n__l_i_s_t operand, but is useful when _p_a_t_t_e_r_n__l_i_s_t begins with the hyphen delimiter. It is also useful when it is more convenient to provide multiple patterns as separate arguments. Earlier drafts did not show that the -c, -l, and -q options were mutually exclusive. This has been fixed to more closely align with historical practice and documentation. Historical implementations usually silently ignored all but one of multiply specified -e and -f options, but were not consistent as to which specification was actually used. POSIX.2 requires that the nonobsolescent forms accept multiple -e and -f options and use all of the patterns specified while matching input text lines. [Note that the order of evaluation is not specified. If an implementation finds a null string as a pattern, it is allowed to use that pattern first (matching every line) and effectively ignore any other patterns.] The -b option was removed from the Options subclause, since block numbers are implementation dependent. The System V restriction on using - to mean standard input was lifted. A definition of action taken when given a null RE or ERE is specified. This is an error condition in some historical implementations. The -l option previously indicated that its use was undefined when no files were explicitly named. This behavior was historical and placed an unnecessary restriction on future implementations. It has been removed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 544 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The -q option was added at the suggestion of members of the balloting group as a means of easily determining whether or not a pattern (or string) exists in a group of files. When searching several files, it provides a performance improvement (because it can quit as soon as it finds the first match) and requires less care by the user in choosing the set of files to supply as arguments (because it will exit zero if it finds a match even if grep detected an access or read error on earlier file operands). The historical BSD grep -s option practice is easily duplicated by redirecting standard output to /dev/null. The -s option required here is from System V. The -x option, historically available only with fgrep, is available here for all of the nonobsolescent versions. END_RATIONALE 4.29 head - Copy the first part of files 4.29.1 Synopsis head [-n _n_u_m_b_e_r] [_f_i_l_e ...] _O_b_s_o_l_e_s_c_e_n_t _v_e_r_s_i_o_n: head [-_n_u_m_b_e_r] [_f_i_l_e ...] 4.29.2 Description The head utility shall copy its input files to the standard output, ending the output for each file at a designated point. Copying shall end at the point in each input file indicated by the -n _n_u_m_b_e_r option (or the obsolescent version's -_n_u_m_b_e_r argument). The option-argument _n_u_m_b_e_r shall be counted in units of lines. 4.29.3 Options The head utility shall conform to the utility argument syntax guidelines described in standard described in 2.10.2, except that the obsolescent version accepts multicharacter numeric options. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.29 head - Copy the first part of files 545 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following option shall be supported by the implementation in the nonobsolescent version: -n _n_u_m_b_e_r The first _n_u_m_b_e_r lines of each input file shall be copied to standard output. The _n_u_m_b_e_r option argument shall be a positive decimal integer. If no options are specified, head shall act as if -n 10 had been specified. In the obsolescent version, the following option shall be supported by the implementation: -_n_u_m_b_e_r The _n_u_m_b_e_r argument is a positive decimal integer with the same effect as the -n_n_u_m_b_e_r option in the nonobsolescent version. 4.29.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If no _f_i_l_e operands are specified, the standard input shall be used. 4.29.5 External Influences 4.29.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.29.5.2 Input Files Input files shall be text files, but the line length shall not be restricted to {LINE_MAX} bytes. 4.29.5.3 Environment Variables The following environment variables shall affect the execution of head: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 546 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.29.5.4 Asynchronous Events Default. 4.29.6 External Effects 4.29.6.1 Standard Output The standard output shall contain designated portions of the input file(s). If multiple _f_i_l_e operands are specified, head shall precede the output for each with the header: "\n==> %s <==\n", <_p_a_t_h_n_a_m_e> except that the first header written shall not include the initial . 4.29.6.2 Standard Error Used only for diagnostic messages. 4.29.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.29 head - Copy the first part of files 547 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.29.7 Extended Description None. 4.29.8 Exit Status The head utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.29.9 Consequences of Errors Default. BEGIN_RATIONALE 4.29.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _U_s_a_g_e_,__E_x_a_m_p_l_e_s The nonobsolescent version of head was created to allow conformance to the Utility Syntax Guidelines. The -n option was added to this new interface so that head and tail would be more logically related. To write the first ten lines of all files (except those with a leading period) in the directory: head * _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The head utility was not in early drafts. It was felt that head, and its frequent companion, tail, were useful mostly to interactive users, and not application programs. However, balloting input suggested that these utilities actually do find significant use in scripts, such as to write out portions of log files. Although it is possible to simulate head with sed 10q for a single file, the working group decided that the popularity of head on historical BSD systems warranted its inclusion alongside tail. An earlier draft had the synopsis line: head [ -c | -l ] [-n _n_u_m_b_e_r] [_f_i_l_e ...] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 548 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 This was changed to the current form based on comments and objections noting that -c has not been provided by historical versions of head and other utilities in POSIX.2 provide similar functionality. Also, -l was changed to -n to match a similar change in tail. END_RATIONALE 4.30 id - Return user identity 4.30.1 Synopsis id [_u_s_e_r] id -G [-n] [_u_s_e_r] id -g [-nr] [_u_s_e_r] id -u [-nr] [_u_s_e_r] 4.30.2 Description If no _u_s_e_r operand is provided, the id utility shall write the user and group IDs and the corresponding user and group names of the invoking process to standard output. If the effective and real IDs do not match, both shall be written. If multiple groups are supported by the underlying system (see the description of {NGROUPS_MAX} in POSIX.1 {8}), the supplementary group affiliations of the invoking process also shall be written. If a _u_s_e_r operand is provided and the process has the appropriate privileges, the user and group IDs of the selected user shall be written. In this case, effective IDs shall be assumed to be identical to real IDs. 1 If the selected user has more than one allowable group membership listed 1 in the group database (see POSIX.1 {8} section 9.1), these shall be 1 written in the same manner as the supplementary groups described in the 1 preceding paragraph. 1 4.30.3 Options The id utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.30 id - Return user identity 549 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -G Output all different group IDs (effective, real, and supplementary) only, using the format "%u\n". If there is more than one distinct group affiliation, output each such affiliation, using the format " %u", before the is output. -g Output only the effective group ID, using the format "%u\n". -n Output the name in the format "%s" instead of the numeric ID using the format "%u". -r Output the real ID instead of the effective ID. -u Output only the effective user ID, using the format "%u\n". 4.30.4 Operands The following operand shall be supported by the implementation: _u_s_e_r The login name for which information is to be written. 4.30.5 External Influences 4.30.5.1 Standard Input None. 4.30.5.2 Input Files None. 4.30.5.3 Environment Variables The following environment variables shall affect the execution of id: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 550 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.30.5.4 Asynchronous Events Default. 4.30.6 External Effects 4.30.6.1 Standard Output The following formats shall be used when the LC_MESSAGES locale category specifies the POSIX Locale. In other locales, the strings uid, gid, euid, egid, and groups may be replaced with more appropriate strings corresponding to the locale. "uid=%u(%s) gid=%u(%s)\n", <_r_e_a_l _u_s_e_r _I_D>, <_u_s_e_r-_n_a_m_e>, <_r_e_a_l _g_r_o_u_p _I_D>, <_g_r_o_u_p-_n_a_m_e> If the effective and real user IDs do not match, the following shall be inserted immediately before the \n character in the previous format: " euid=%u(%s)", with the following arguments added at the end of the argument list: <_e_f_f_e_c_t_i_v_e _u_s_e_r _I_D>, <_e_f_f_e_c_t_i_v_e _u_s_e_r-_n_a_m_e> If the effective and real group IDs do not match, the following shall be inserted directly before the \n character in the format string (and after any addition resulting from the effective and real user IDs not matching): " egid=%u(%s)", with the following arguments added at the end of the argument list: <_e_f_f_e_c_t_i_v_e _g_r_o_u_p-_I_D>, <_e_f_f_e_c_t_i_v_e _g_r_o_u_p _n_a_m_e> If the process has supplementary group affiliations or the selected user 1 is allowed to belong to multiple groups, the first shall be added 1 directly before the character in the format string: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.30 id - Return user identity 551 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX " groups=%u(%s)" with the following arguments added at the end of the argument list: <_s_u_p_p_l_e_m_e_n_t_a_r_y _g_r_o_u_p _I_D>, <_s_u_p_p_l_e_m_e_n_t_a_r_y _g_r_o_u_p _n_a_m_e> and the necessary number of the following added after that for any remaining supplementary group IDs: ",%u(%s)" and the necessary number of the following arguments added at the end of the argument list: <_s_u_p_p_l_e_m_e_n_t_a_r_y _g_r_o_u_p _I_D>, <_s_u_p_p_l_e_m_e_n_t_a_r_y _g_r_o_u_p _n_a_m_e> If any of the user ID, group ID, effective user ID, effective group ID, 1 or supplementary/multiple group IDs cannot be mapped by the system into 1 printable user or group names, the corresponding (%s) and name argument shall be omitted from the corresponding format string. When any of the options are specified, the output format shall be as described under 4.30.3. 4.30.6.2 Standard Error Used only for diagnostic messages. 4.30.6.3 Output Files None. 4.30.7 Extended Description None. 4.30.8 Exit Status The id utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 552 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.30.9 Consequences of Errors Default. BEGIN_RATIONALE 4.30.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The functionality provided by the 4BSD groups utility can be simulated using: id -Gn [_u_s_e_r] Note that output produced by the -G option and by the default case could potentially produce very long lines on systems that support large numbers of supplementary groups. (On systems with user and group IDs that are 32-bit integers and with group names with a maximum of 8 bytes per name, 93 supplementary groups plus distinct effective and real group and user IDs could theoretically overflow the 2048-byte {LINE_MAX} text file line limit on the default output case. It would take about 186 supplementary groups to overflow the 2048-byte barrier using id -G.) This is not expected to be a problem in practice, but in cases where it is a concern, applications should consider using fold -s (see 4.25) before postprocessing the output of id. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The 4BSD command groups was considered, but was not used as it did not provide the functionality of the id utility of the _S_V_I_D. Also, it was thought that it would be easier to modify id to provide the additional functionality necessary to systems with multiple groups than to invent another command. The options -u, -g, -n, and -r were added to ease the use of id with shell commands substitution. Without these options it is necessary to use some preprocessor such as sed to select the desired piece of information. Since output such as that produced by id -u -n is wanted frequently, it seemed desirable to add the options. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.30 id - Return user identity 553 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.31 join - Relational database operator 4.31.1 Synopsis join [ -a _f_i_l_e__n_u_m_b_e_r | -v _f_i_l_e__n_u_m_b_e_r ] [-e _s_t_r_i_n_g] [-o _l_i_s_t] [-t _c_h_a_r] [-1 _f_i_e_l_d] [-2 _f_i_e_l_d] _f_i_l_e_1 _f_i_l_e_2 _O_b_s_o_l_e_s_c_e_n_t _v_e_r_s_i_o_n: join [-_a _f_i_l_e__n_u_m_b_e_r] [-e _s_t_r_i_n_g] [-j _f_i_e_l_d] [-j1 _f_i_e_l_d] [-j2 _f_i_e_l_d] [-o _l_i_s_t ...] [-t _c_h_a_r] _f_i_l_e_1 _f_i_l_e_2 4.31.2 Description The join utility shall perform an ``equality join'' on the files _f_i_l_e_1 and _f_i_l_e_2. The joined files shall be written to the standard output. The ``join field'' is a field in each file on which the files are compared. There shall be one line in the output for each pair of lines in _f_i_l_e_1 and _f_i_l_e_2 that have identical join fields. The output line by default shall consist of the join field, then the remaining fields from _f_i_l_e_1, then the remaining fields from _f_i_l_e_2. This format can be changed by using the -o option (see below). The -a option can be used to add unmatched lines to the output. The -v option can be used to output only unmatched lines. By default, the files _f_i_l_e_1 and _f_i_l_e_2 should be ordered in the collating sequence of sort -b (see 4.58) on the fields on which they are to be joined, by default the first in each line. All selected output shall be written in the same collating sequence. The default input field separators shall be s. In this case, multiple separators shall count as one field separator, and leading separators shall be ignored. The default output field separator shall be a . The field separator and collating sequence can be changed by using the -t option (see below). If the input files are not in the appropriate collating sequence, the results are unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 554 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.31.3 Options The join utility shall conform to the utility argument syntax guidelines described in 2.10.2. The obsolescent version does not follow the utility argument syntax guidelines: the -j1 and -j2 options are multicharacter options and the -o option takes multiple arguments. The following options shall be supported by the implementation: -a _f_i_l_e__n_u_m_b_e_r Produce a line for each unpairable line in file _f_i_l_e__n_u_m_b_e_r, where _f_i_l_e__n_u_m_b_e_r is 1 or 2, in addition to the default output. If both -a 1 and -a 2 are specified, all unpairable lines shall be output. -e _s_t_r_i_n_g Replace empty output fields by string _s_t_r_i_n_g. -j _f_i_e_l_d (Obsolescent.) Equivalent to: -1 _f_i_e_l_d -2 _f_i_e_l_d -j1 _f_i_e_l_d (Obsolescent.) Equivalent to: -1 _f_i_e_l_d -j2 _f_i_e_l_d (Obsolescent.) Equivalent to: -2 _f_i_e_l_d -o _l_i_s_t Construct the output line to comprise the fields specified in _l_i_s_t, each element of which has the form _f_i_l_e__n_u_m_b_e_r._f_i_e_l_d, where _f_i_l_e__n_u_m_b_e_r is a file number and _f_i_e_l_d is a decimal integer field number. The elements of _l_i_s_t are either comma- or -separated, as specified in Guideline 8 in 2.10.2. The fields specified by _l_i_s_t shall be written for all selected output lines. Fields selected by _l_i_s_t that do not appear in the input shall be treated as empty output fields. (See the -e option.) The join field shall not be written unless specifically requested. The _l_i_s_t shall be a single command line argument. However, as an obsolescent feature, the argument _l_i_s_t can be multiple arguments on the command line. If this is the case, and if the -o option is the last option before _f_i_l_e_1, and if _f_i_l_e_1 is of the form _s_t_r_i_n_g._s_t_r_i_n_g, the results are undefined. -t _c_h_a_r Use character _c_h_a_r as a separator, for both input and output. Every appearance of _c_h_a_r in a line shall be significant. When this option is specified, the collating sequence should be the same as sort without the -b option. -v _f_i_l_e__n_u_m_b_e_r Instead of the default output, produce a line only for each unpairable line in _f_i_l_e__n_u_m_b_e_r, where _f_i_l_e__n_u_m_b_e_r is 1 or 2. If both -v 1 and -v 2 are specified, all Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.31 join - Relational database operator 555 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX unpairable lines shall be output. -1 _f_i_e_l_d Join on the _f_i_e_l_dth field of file 1. Fields are decimal integers starting with 1. -2 _f_i_e_l_d Join on the _f_i_e_l_dth field of file 2. Fields are decimal integers starting with 1. 4.31.4 Operands The following operands shall be supported by the implementation: _f_i_l_e_1 _f_i_l_e_2 A pathname of a file to be joined. If either of the _f_i_l_e_1 or _f_i_l_e_2 operands is -, the standard input is used in its place. 4.31.5 External Influences 4.31.5.1 Standard Input The standard input shall be used only if the _f_i_l_e_1 or _f_i_l_e_2 operand is -. See Input Files. 4.31.5.2 Input Files The input files shall be text files. 4.31.5.3 Environment Variables The following environment variables shall affect the execution of join: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the collating sequence join expects to have been used when the input files were sorted. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 556 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.31.5.4 Asynchronous Events Default. 4.31.6 External Effects 4.31.6.1 Standard Output The join utility output shall be a concatenation of selected character fields. When the -o option is not specified, the output shall be: "%s%s%s\n", <_j_o_i_n _f_i_e_l_d>, <_o_t_h_e_r _f_i_l_e_1 _f_i_e_l_d_s>, <_o_t_h_e_r _f_i_l_e_2 _f_i_e_l_d_s> If the join field is not the first field in either file, the <_o_t_h_e_r _f_i_l_e _f_i_e_l_d_s> are: <_f_i_e_l_d_s _p_r_e_c_e_d_i_n_g _j_o_i_n _f_i_e_l_d>, <_f_i_e_l_d_s _f_o_l_l_o_w_i_n_g _j_o_i_n _f_i_e_l_d> When the -o option is specified, the output format shall be: "%s\n", <_c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _f_i_e_l_d_s> where the concatenation of fields is described by the -o option, above. For either format, each field (except the last) shall be written with its trailing separator character. If the separator is the default (s), a single character shall be written after each field (except the last). 4.31.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.31 join - Relational database operator 557 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.31.6.3 Output Files None. 4.31.7 Extended Description None. 4.31.8 Exit Status The join utility shall exit with one of the following values: 0 All input files were output successfully. >0 An error occurred. 4.31.9 Consequences of Errors Default. BEGIN_RATIONALE 4.31.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Pathnames consisting of numeric digits should not be specified directly following the -o list. The developers of the standard believed that join should operate as documented in the _S_V_I_D and BSD, not as historically implemented. Historical implementations do not behave as documented in these areas: (1) Most implementations of join require using the -o option when using the -e option. (2) Most implementations do not parse the -o option as documented, and parse the elements as separate _a_r_g_v items, until the item is not of the form _f_i_l_e__n_u_m_b_e_r._f_i_e_l_d. This behavior is permitted as an obsolescent usage of the utility. To ensure maximum portability, _f_i_l_e_1 should not be of the form _s_t_r_i_n_g._s_t_r_i_n_g. A suitable alternative to guarantee portability would be to put the -- flag before any _f_i_l_e_1 operand. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 558 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The obsolescent -j, -j1, and -j2 options have been described to show how they have been used in historical implementations. Earlier drafts showed -j _f_i_l_e__n_u_m_b_e_r _f_i_e_l_d, but a space was never allowed before the _f_i_l_e__n_u_m_b_e_r and two option arguments were never intended. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The ability to specify _f_i_l_e_2 as - is not historical practice; it was added for completeness. As a result of a balloting comment, the -v option was added to the nonobsolescent version. This option was felt necessary because it permitted the writing of _o_n_l_y those lines that do not match on the join field, as opposed to the -a option, which prints both lines that do and do not match. This additional facility is parallel with the -v option of grep. END_RATIONALE 4.32 kill - Terminate or signal processes 4.32.1 Synopsis kill -s _s_i_g_n_a_l__n_a_m_e _p_i_d ... kill -l [_e_x_i_t__s_t_a_t_u_s] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n_s: kill [-_s_i_g_n_a_l__n_a_m_e] _p_i_d ... kill [-_s_i_g_n_a_l__n_u_m_b_e_r] _p_i_d ... 4.32.2 Description The kill utility shall send a signal to the process(es) specified by each _p_i_d operand. For each _p_i_d operand, the kill utility shall perform actions equivalent to the POSIX.1 {8} _k_i_l_l() function called with the following arguments: (1) The value of the _p_i_d operand shall be used as the _p_i_d argument. (2) The _s_i_g argument is the value specified by the -s option, -_s_i_g_n_a_l__n_u_m_b_e_r option, or the -_s_i_g_n_a_l__n_a_m_e option, or by SIGTERM, if none of these options is specified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.32 kill - Terminate or signal processes 559 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.32.3 Options The kill utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that in the obsolescent form, the -_s_i_g_n_a_l__n_u_m_b_e_r and -_s_i_g_n_a_l__n_a_m_e options are usually more than a single character. The following options shall be supported by the implementation: -l (The letter ell.) Write all values of _s_i_g_n_a_l__n_a_m_e supported by the implementation, if no operand is given. If an _e_x_i_t__s_t_a_t_u_s operand is given and it is a value of the ? shell special parameter (see 3.5.2 and wait in 4.70) corresponding to a process that was terminated by a signal, the _s_i_g_n_a_l__n_a_m_e corresponding to the signal that terminated the process shall be written. If an _e_x_i_t__s_t_a_t_u_s operand is given and it is the unsigned decimal integer value of a signal number, the _s_i_g_n_a_l__n_a_m_e (the POSIX.1 {8}-defined symbolic constant name without the SIG prefix) corresponding to that signal shall be written. Otherwise, the results are unspecified. -s _s_i_g_n_a_l__n_a_m_e Specify the signal to send, using one of the symbolic names defined for Required Signals or Job Control Signals in POSIX.1 {8} 3.3.1.1. Values of _s_i_g_n_a_l__n_a_m_e shall be recognized in a case-independent fashion, without the SIG prefix. In addition, the symbolic name 0 shall be recognized, representing the signal value zero. The corresponding signal shall be sent instead of SIGTERM. -_s_i_g_n_a_l__n_a_m_e (Obsolescent.) Equivalent to -s _s_i_g_n_a_l__n_a_m_e. -_s_i_g_n_a_l__n_u_m_b_e_r (Obsolescent.) Specify a nonnegative decimal integer, _s_i_g_n_a_l__n_u_m_b_e_r, representing the signal to be used instead of SIGTERM, as the _s_i_g argument in the effective call to _k_i_l_l(). The correspondence between integer values and the _s_i_g value used is shown in the following table. _ssss_iiii_gggg_nnnn_aaaa_llll______nnnn_uuuu_mmmm_bbbb_eeee_rrrr _ssss_iiii_gggg Value _____________ _________ 0 0 1 SIGHUP 2 SIGINT 3 SIGQUIT 6 SIGABRT Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 560 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 9 SIGKILL 14 SIGALRM 15 SIGTERM The effects of specifying any _s_i_g_n_a_l__n_u_m_b_e_r other than those listed in the table are undefined. In the obsolescent versions, if the first argument is a negative integer, it shall be interpreted as a -_s_i_g_n_a_l__n_u_m_b_e_r option, not as a negative _p_i_d operand specifying a process group. 4.32.4 Operands The following operands shall be supported by the implementation: _p_i_d A decimal integer specifying a process or process group to be signaled. The process(es) selected by positive, negative, and zero values of the _p_i_d operand shall be as described for POSIX.1 {8} _k_i_l_l() function. If the first _p_i_d operand is negative, it should be preceded by -- to keep it from being interpreted as an option. _e_x_i_t__s_t_a_t_u_s A decimal integer specifying a signal number or the exit status of a process terminated by a signal. 4.32.5 External Influences 4.32.5.1 Standard Input None. 4.32.5.2 Input Files None. 4.32.5.3 Environment Variables The following environment variables shall affect the execution of kill: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.32 kill - Terminate or signal processes 561 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.32.5.4 Asynchronous Events Default. 4.32.6 External Effects 4.32.6.1 Standard Output When the -l option is not specified, the standard output shall not be used. When the -l option is specified, the symbolic name of each signal shall be written in the following format: "%s%c", <_s_i_g_n_a_l__n_a_m_e>, <_s_e_p_a_r_a_t_o_r> where the <_s_i_g_n_a_l__n_a_m_e> is in uppercase, without the SIG prefix, and the <_s_e_p_a_r_a_t_o_r> shall be either a or a . For the last signal written, <_s_e_p_a_r_a_t_o_r> shall be a . When both the -l option and _e_x_i_t__s_t_a_t_u_s operand are specified, the symbolic name of the corresponding signal shall be written in the following format: "%s\n", <_s_i_g_n_a_l__n_a_m_e> 4.32.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 562 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.32.6.3 Output Files None. 4.32.7 Extended Description None. 4.32.8 Exit Status The kill utility shall exit with one of the following values: 0 At least one matching process was found for each _p_i_d operand, and the specified signal was successfully processed for at least one matching process. >0 An error occurred. 4.32.9 Consequences of Errors Default. BEGIN_RATIONALE 4.32.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Any of the commands kill -9 100 -165 kill -s kill 100 -165 kill -s KILL 100 -165 sends the SIGKILL signal to the process whose process ID is 100 and to all processes whose process group ID is 165, assuming the sending process has permission to send that signal to the specified processes, and that they exist. POSIX.1 {8} and POSIX.2 do not require specific signal numbers for any _s_i_g_n_a_l__n_a_m_e_s. Even the -_s_i_g_n_a_l__n_u_m_b_e_r option provides symbolic (although numeric) names for signals. If a process is terminated by a signal, its exit status indicates the signal that killed it, but the exact values are not specified. The kill -l option, however, can be used to map decimal signal numbers and exit status values into the name of a signal. The Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.32 kill - Terminate or signal processes 563 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX following example reports the status of a terminated job: job stat=$? if [ $stat -eq 0 ] then echo job completed successfully. elif [ $stat -gt 128 ] then echo job terminated by signal SIG$(kill -l $stat). else echo job terminated with error code $stat. fi _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The signal name extension was based on a desire to avoid limiting the kill utility to implementation-dependent values. The -l option originated from the C-shell, and is also implemented in the KornShell. The C-shell output can consist of multiple output lines, because the signal names do not always fit on a single line on some terminal screens. The KornShell output also included the implementation-specific signal numbers, and was felt by the working group to be too difficult for scripts to parse conveniently. The specified output format is intended not only to accommodate the historical C-shell output, but also to permit an entirely vertical or entirely horizontal listing on systems for which this is appropriate. An earlier draft invented the name SIGNULL as a _s_i_g_n_a_l__n_a_m_e for signal 0 (used by POSIX.1 {8} to test for the existence of a process without sending it a signal). Since the _s_i_g_n_a_l__n_a_m_e "0" can be used in this case unambiguously, SIGNULL has been removed. An earlier draft also required symbolic _s_i_g_n_a_l__n_a_m_es to be recognized with or without the SIG prefix. Historical versions of kill have not written the SIG prefix for the -l option and have not recognized the SIG prefix on _s_i_g_n_a_l__n_a_m_es. Since neither application portability nor ease of use would be improved by requiring this extension, it is no longer required. POSIX.2 contains no utility that browses for process IDs. Values for _p_i_d are available via the ! and $ parameters of the shell command language (see 3.5.2). The use of numeric signal values was the subject of a long debate in the Working Group. During balloting, it was determined that their use should be declared obsolescent, but retained to provide backward compatibility to existing applications. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 564 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Existing implementations of kill permit negative _p_i_d operands representing process groups, but this was often unclearly documented. The assumption that an initial negative number argument specifies a signal number (rather than a process group) is the existing behavior, and was retained. Therefore, to send the default signal to a process group (say 123), an application should use a command similar to one of the following: kill -TERM -123 kill -- -123 The -s option was added in response to international interest in providing some form of kill that meets the Utility Syntax Guidelines. Some implementations provide kill only as a shell built-in utility and use that status to support the extension of killing background asynchronous lists (those started with &), by the use of job identifiers. For example, kill %1 would kill the first asynchronous list in the background. This standard does not require (but permits) such an extension, because other related job-control features are not provided by the shell, and because these facilities are not ordinarily usable in portable shell applications. This notation is expected to be introduced by the UPE. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.32 kill - Terminate or signal processes 565 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.33 ln - Link files 4.33.1 Synopsis ln [-f] _s_o_u_r_c_e__f_i_l_e _t_a_r_g_e_t__f_i_l_e ln [-f] _s_o_u_r_c_e__f_i_l_e ... _t_a_r_g_e_t__d_i_r 4.33.2 Description In the first synopsis form, the ln utility shall create a new directory entry (link) for the file specified by the _s_o_u_r_c_e__f_i_l_e operand, at the _d_e_s_t_i_n_a_t_i_o_n path specified by the _t_a_r_g_e_t__f_i_l_e operand. This first synopsis form shall be assumed when the final operand does not name an existing directory; if more than two operands are specified and the final 1 is not an existing directory, an error shall result. 1 In the second synopsis form, the ln utility shall create a new directory entry for each file specified by a _s_o_u_r_c_e__f_i_l_e operand, at a _d_e_s_t_i_n_a_t_i_o_n path in the existing directory named by _t_a_r_g_e_t__d_i_r. If the last operand specifies an existing file of a type not specified by POSIX.1 {8}, the behavior is implementation defined. The corresponding destination path for each _s_o_u_r_c_e__f_i_l_e shall be the concatenation of the target directory pathname, a slash character, and the last pathname component of the _s_o_u_r_c_e__f_i_l_e. The second synopsis form shall be assumed when the final operand names an existing directory. For each _s_o_u_r_c_e__f_i_l_e: (1) If the _d_e_s_t_i_n_a_t_i_o_n path exists: (a) If the -f option is not specified, ln shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_e_s. (b) Actions shall be performed equivalent to the POSIX.1 {8} _u_n_l_i_n_k() function, called using _d_e_s_t_i_n_a_t_i_o_n as the _p_a_t_h argument. If this fails for any reason, ln shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_e_s. (2) Actions shall be performed equivalent to the POSIX.1 {8} _l_i_n_k() function using _s_o_u_r_c_e__f_i_l_e as the _p_a_t_h_1 argument, and the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 566 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _d_e_s_t_i_n_a_t_i_o_n path as the _p_a_t_h_2 argument. 4.33.3 Options The ln utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -f Force existing _d_e_s_t_i_n_a_t_i_o_n pathnames to be removed to allow the link. 4.33.4 Operands The following operands shall be supported by the implementation: _s_o_u_r_c_e__f_i_l_e A pathname of a file to be linked. This can be a regular or special file; whether a directory can be linked is implementation defined. _t_a_r_g_e_t__f_i_l_e The pathname of the new directory entry to be created. _t_a_r_g_e_t__d_i_r A pathname of an existing directory in which the new directory entries are to be created. 4.33.5 External Influences 4.33.5.1 Standard Input None. 4.33.5.2 Input Files None. 4.33.5.3 Environment Variables The following environment variables shall affect the execution of ln: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.33 ln - Link files 567 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.33.5.4 Asynchronous Events Default. 4.33.6 External Effects 4.33.6.1 Standard Output None. 4.33.6.2 Standard Error Used only for diagnostic messages. 4.33.6.3 Output Files None. 4.33.7 Extended Description None. 4.33.8 Exit Status The ln utility shall exit with one of the following values: 0 All the specified files were linked successfully. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 568 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.33.9 Consequences of Errors Default. BEGIN_RATIONALE 4.33.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e None. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Some historic versions of ln (including the one specified by the _S_V_I_D) unlink the destination file, if it exists, by default. If the mode does not permit writing, these versions will prompt for confirmation before attempting the unlink. In these versions the -f option causes ln to not attempt to prompt for confirmation. This allows ln to succeed in creating links when the target file already exists, even if the file itself is not writable (although the directory must be). Previous versions of this draft specified this functionality. This draft does not allow the ln utility to unlink existing destination paths by default for the following reasons: - The ln utility has traditionally been used to provide locking for shell applications, a usage that is incompatible with ln unlinking the destination path by default. There was no corresponding technical advantage to adding this functionality. - This functionality gave ln the ability to destroy the link structure of files, which changes the historical behavior of ln. - This functionality is easily replicated with a combination of rm and ln. - It is not historical practice in many systems; BSD and BSD-derived systems do not support this behavior. Unfortunately, whichever behavior is selected can cause scripts written expecting the other behavior to fail. - It is preferable that ln perform in the same manner as the _l_i_n_k() function, which does not permit the target to already exist. This standard retains the -f option to provide support for shell scripts depending on the _S_V_I_D semantics. It seems likely that shell scripts Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.33 ln - Link files 569 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX would not be written to handle prompting by ln, and would therefore have specified the -f option. It should also be noted that -f is an undocumented feature of many historical versions of the ln utility, allowing linking to directories. These versions will require modification. Previous drafts of this standard also required an -i option, which behaved like the -i options in cp and mv, prompting for confirmation before unlinking existing files. This was not historical practice for the ln utility and has been deleted from this version. Although symbolic links are not part of the standard, the -s option should be used only for the traditional purpose of creating symbolic links. END_RATIONALE 4.34 locale - Get locale-specific information 4.34.1 Synopsis locale [ -a | -m ] locale [-ck] _n_a_m_e ... 4.34.2 Description The locale utility shall write information about the current locale environment, or all public locales, to the standard output. For the purposes of this clause, a _p_u_b_l_i_c _l_o_c_a_l_e is one provided by the implementation that is accessible to the application. When locale is invoked without any arguments, it shall summarize the current locale environment for each locale category as determined by the settings of the environment variables defined in 2.5. When invoked with operands, it shall write values that have been assigned to the keywords in the locale categories, as follows: - Specifying a keyword name shall select the named keyword and the category containing that keyword. - Specifying a category name shall select the named category and all keywords in that category. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 570 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.34.3 Options The locale utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -a Write information about all available public locales. The available locales shall include POSIX, representing the POSIX Locale. The manner in which the implementation determines what other locales are available is implementation defined. -c Write the names of selected locale categories; see 4.34.6.1. -k Write the names and values of selected keywords. The implementation may omit values for some keywords; see 4.34.4. -m Write names of available charmaps; see 2.4.1. 1 4.34.4 Operands The following operand shall be supported by the implementation: _n_a_m_e The name of a locale category as defined in 2.5, the name of a keyword in a locale category, or the reserved name charmap. The named category or keyword shall be selected for output. If a single _n_a_m_e represents both a locale category name and a keyword name in the current locale, the results are unspecified. Otherwise, both category and keyword names can be specified as _n_a_m_e operands, in any sequence. It is implementation defined whether any keyword values are written for the categories LC_CTYPE and LC_COLLATE. 4.34.5 External Influences 4.34.5.1 Standard Input None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.34 locale - Get locale-specific information 571 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.34.5.2 Input Files None. 4.34.5.3 Environment Variables The following environment variables shall affect the execution of locale: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. The LANG and LC_* environment variables shall specify the current locale environment to be written out; they shall be used if the -a option is not specified. 4.34.5.4 Asynchronous Events Default. 4.34.6 External Effects 4.34.6.1 Standard Output If locale is invoked without any options or operands, the names and values of the LANG and LC_* environment variables described in this standard shall be written to the standard output, one variable per line, with LANG first, and each line using the following format. Only those variables set in the environment and not overridden by LC_ALL shall be written using this format: "%s=%s\n", <_v_a_r_i_a_b_l_e__n_a_m_e>, <_v_a_l_u_e> Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 572 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The names of those LC_* variables associated with locale categories defined in this standard that are not set in the environment or are overridden by LC_ALL shall be written in the following format: "%s=\"%s\"\n", <_v_a_r_i_a_b_l_e__n_a_m_e>, <_i_m_p_l_i_e_d _v_a_l_u_e> The <_i_m_p_l_i_e_d _v_a_l_u_e> shall be the name of the locale that has been selected for that category by the implementation, based on the values in LANG and LC_ALL, as described in 2.6. The <_v_a_l_u_e> and <_i_m_p_l_i_e_d _v_a_l_u_e> shown above shall be properly quoted for 1 possible later re-entry to the shell. The <_v_a_l_u_e> shall not be quoted 1 using double-quotes (so that it can be distinguished by the user from the 1 <_i_m_p_l_i_e_d _v_a_l_u_e> case, which always requires double-quotes). 1 The LC_ALL variable shall be written last, using the first format shown 1 above. If it is not set, it shall be written as: "LC_ALL=\n" If any arguments are specified: (1) If the -a option is specified, the names of all the public locales shall be written, each in the following format: "%s\n", <_l_o_c_a_l_e _n_a_m_e> (2) If the -c option is specified, the name(s) of all selected categories shall be written, each in the following format: "%s\n", <_c_a_t_e_g_o_r_y _n_a_m_e> If keywords are also selected for writing (see following items), the category name output shall precede the keyword output for that category. If the -c option is not specified, the names of the categories 2 shall not be written; only the keywords, as selected by the _n_a_m_e 2 operand, shall be written. 2 (3) If the -k option is specified, the name(s) and value(s) of selected keywords shall be written. If a value is nonnumeric, it shall be written in the following format: "%s=\"%s\"\n", <_k_e_y_w_o_r_d _n_a_m_e>, <_k_e_y_w_o_r_d _v_a_l_u_e> If the keyword was charmap, the name of the charmap (if any) that was specified via the localedef -f option when the locale was created shall be written, with the word charmap as <_k_e_y_w_o_r_d Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.34 locale - Get locale-specific information 573 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _n_a_m_e>. If a value is numeric, it shall be written in one of the following formats: "%s=%d\n", <_k_e_y_w_o_r_d _n_a_m_e>, <_k_e_y_w_o_r_d _v_a_l_u_e> "%s=%c%o\n", <_k_e_y_w_o_r_d _n_a_m_e>, <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r>, <_k_e_y_w_o_r_d _v_a_l_u_e> "%s=%cx%x\n", <_k_e_y_w_o_r_d _n_a_m_e>, <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r>, <_k_e_y_w_o_r_d _v_a_l_u_e> where the <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r> is that identified by the escape_char keyword in the current locale; see 2.5.2. Compound keyword values (list entries) shall be separated in the output by semicolons. When included in keyword values, the semicolon, the double-quote, the backslash, and any control character shall be preceded (escaped) with the escape character. (4) If the -k option is not specified, selected keyword values shall be written, each in the following format: "%s\n", <_k_e_y_w_o_r_d _v_a_l_u_e> If the keyword was charmap, the name of the charmap (if any) that was specified via the localedef -f option when the locale was created shall be written. (5) If the -m option is specified, then a list of all available charmaps shall be written, each in the format "%s\n", <_c_h_a_r_m_a_p> where <_c_h_a_r_m_a_p> is in a format suitable for use as the option- argument to the localedef -f option. 4.34.6.2 Standard Error Used only for diagnostic messages. 4.34.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 574 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.34.7 Extended Description None. 4.34.8 Exit Status The locale utility shall exit with one of the following values: 0 All the requested information was found and output successfully. >0 An error occurred. 4.34.9 Consequences of Errors Default. BEGIN_RATIONALE 4.34.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e In the following examples, the assumption is that locale environment variables are set as follows: LANG=locale_x LC_COLLATE=locale_y The command: locale would result in the following output: LANG=locale_x 1 LC_CTYPE="locale_x" LC_COLLATE=locale_y LC_TIME="locale_x" LC_NUMERIC="locale_x" LC_MONETARY="locale_x" LC_MESSAGES="locale_x" LC_ALL= The order of presentation of the categories is not specified by this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.34 locale - Get locale-specific information 575 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The command LC_ALL=POSIX locale -ck decimal_point would produce: LC_NUMERIC decimal_point="." The following command shows an application of locale to determine whether a user supplied response is affirmative: if printf "%s\n" "$response" | grep -Eq "$(locale yesexpr)" then affirmative processing goes here else nonaffirmative processing goes here fi If the LANG environment variable is not set or set to an empty value, or one of the LC_* environment variables is set to an unrecognized value, the actual locales assumed (if any) are implementation defined as described in 2.6. Implementations are not required to write out the actual values for keywords in the categories LC_CTYPE and LC_COLLATE; however, they must write out the categories (allowing an application to determine, e.g., which character classes are available). _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This command was added in Draft 9 to resolve objections to the lack of a way for applications to determine what locales are available, a way to examine the contents of existing public locales, a way to retrieve specific locale items, and a way to recognize affirmative and negative responses in an international environment. In Draft 10 it was cut back considerably in answer to balloting objections about its complexity and requirement of features not useful for application programs. The format for the no-arguments case was expanded to show the implied values of the categories as an aid to the novice user; the output was of little more value than that from env. Based on the questionable value in a shell script of getting an entire array of characters back, and the problem of returning a collation description that makes sense, short of a complete localedef source, the output from requests for categories LC_CTYPE and LC_COLLATE has been made implementation defined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 576 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The -m option has been added to allow applications to query for the existence of charmaps. The output is a list of the charmaps (implementation-supplied and user-supplied, if any) on the system. The -c option was included for readability when more than one category is 2 selected (e.g., via more than one keyword name or via a category name). 2 It is valid both with and without the -k option. 2 The charmap keyword, which returns the name of the charmap (if any) that was used when the current locale was created, was introduced to allow applications needing the information to retrieve it. END_RATIONALE 4.35 localedef - Define locale environment 4.35.1 Synopsis localedef [-c] [-f _c_h_a_r_m_a_p] [-i _s_o_u_r_c_e_f_i_l_e] _n_a_m_e 4.35.2 Description The localedef utility shall convert source definitions for locale categories into a format usable by the functions and utilities whose operational behavior is determined by the setting of the locale environment variables defined in 2.5. It is implementation defined whether users shall have the capability to create new locales, in addition to those supplied by the implementation. If the symbolic constant {POSIX2_LOCALEDEF} is defined, then the system supports the creation of new locales. In a system not supporting this capability, the localedef utility shall terminate with an exit code of 3. The utility shall read source definitions for one or more locale categories belonging to the same locale from the file named in the -i option (if specified) or from standard input. The _n_a_m_e operand identifies the target locale. The utility shall support the creation of _p_u_b_l_i_c, or generally accessible locales, as well as _p_r_i_v_a_t_e, or restricted-access locales. Implementations may restrict the capability to create or modify public locales to users with the appropriate privileges. Each category source definition shall be identified by the corresponding environment variable name and terminated by an END _c_a_t_e_g_o_r_y-_n_a_m_e statement. The following categories shall be supported. In addition, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.35 localedef - Define locale environment 577 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX the input may contain source for implementation-defined categories. LC_CTYPE Defines character classification and case conversion. LC_COLLATE Defines collation rules. LC_MONETARY Defines the format and symbols used in formatting of monetary information. LC_NUMERIC Defines the decimal delimiter, grouping, and grouping symbol for nonmonetary numeric editing. LC_TIME Defines the format and content of date and time information. LC_MESSAGES Defines the format and values of affirmative and negative responses. 4.35.3 Options The localedef utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -c Create permanent output even if warning messages have been issued. -f _c_h_a_r_m_a_p Specify the pathname of a file containing a mapping of character symbols and collating element symbols to actual character encodings. The format of the _c_h_a_r_m_a_p is described under 2.4.1. This option shall be specified if symbolic names (other than collating symbols defined in a collating-symbol keyword) are used. If the -f option is not present, an implementation-defined default character mapping file shall be used. 2 -i _i_n_p_u_t_f_i_l_e The pathname of a file containing the source definitions. If this option is not present, source definitions shall be read from standard input. The format of the _i_n_p_u_t_f_i_l_e is described in 2.5.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 578 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.35.4 Operands The following operand shall be supported by the implementation: _n_a_m_e Identifies the locale. See 2.5 for a description of the use of this name. If the name contains one or more slash characters, _n_a_m_e shall be interpreted as a pathname where the created locale definition(s) shall be stored. If _n_a_m_e does not contain any slash characters, the interpretation of the name is implementation defined and the locale shall be public. This capability may be restricted to users with appropriate privileges. 4.35.5 External Influences 4.35.5.1 Standard Input Unless the -i option is specified, the standard input shall be a text file containing one or more locale category source definitions, as described in 2.5.2. When lines are continued using the escape character 1 mechanism, there is no limit to the length of the accumulated continued 1 line. 1 4.35.5.2 Input Files The character set mapping file specified as the _c_h_a_r_m_a_p option-argument is described under 2.4.1. If a locale category source definition contains a copy statement, as defined in 2.5.2, and the copy statement names a valid, existing locale, then localedef shall behave as if the source definition had contained a valid category source definition for the named locale. 4.35.5.3 Environment Variables The following environment variables shall affect the execution of localedef: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. and LC_* variables as described in 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.35 localedef - Define locale environment 579 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_COLLATE (This variable shall have no affect on localedef; the POSIX Locale shall be used for this category.) LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of argument data as characters (e.g., single- versus multibyte characters). This variable shall have no affect on the processing of localedef input data; the POSIX Locale shall be used for this purpose, regardless of the value of this variable. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.35.5.4 Asynchronous Events Default. 4.35.6 External Effects 4.35.6.1 Standard Output The utility shall report all categories successfully processed, in an unspecified format. 4.35.6.2 Standard Error Used only for diagnostic messages. 4.35.6.3 Output Files The format of the created output is unspecified. If the _n_a_m_e operand does not contain a slash, the existence of an output file for the locale is unspecified. 4.35.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 580 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.35.8 Exit Status The localedef utility shall exit with one of the following values: 0 No errors occurred and the locale(s) were successfully created. 1 Warnings occurred and the locale(s) were successfully created. 2 The locale specification exceeded implementation limits or the coded character set or sets used were not supported by the implementation, and no locale was created. 3 The capability to create new locales is not supported by the implementation. >3 Warnings or errors occurred and no output was created. 4.35.9 Consequences of Errors If an error is detected, no permanent output shall be created. If warnings occur, permanent output shall be created if the -c option was specified. The following conditions shall cause warning messages to be issued: - If a symbolic name not found in the _c_h_a_r_m_a_p file is used for the descriptions of the LC_CTYPE or LC_COLLATE categories (for other categories, this shall be an error conditions). - If the number of operands to the order keyword exceeds the {COLL_WEIGHTS_MAX} limit. - If optional keywords not supported by the implementation are 1 present in the source. 1 Other implementation-defined conditions may also cause warnings. BEGIN_RATIONALE 4.35.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _U_s_a_g_e_,__E_x_a_m_p_l_e_s The output produced by the localedef utility is implementation defined. The _n_a_m_e operand is used to identify the specific locale. (As a consequence, although several categories can be processed in one execution, only categories belonging to the same locale can be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.35 localedef - Define locale environment 581 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX processed.) The _c_h_a_r_m_a_p definition is optional, and is contained outside the locale definition. This allows both completely ``self-defined'' source files, and ``generic'' sources (applicable to more than one code set). To aid portability, all _c_h_a_r_m_a_p definitions shall use the same symbolic names for the portable character set. As explained in 2.4.1, it is implementation defined whether or not users or applications can provide additional character set description files. Therefore, the -f option might be operable only when an implementation-provided _c_h_a_r_m_a_p is named. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This description is based on work performed in the UniForum Technical Committee Subcommittee on Internationalization. The localedef utility is provided as a standard, portable interface for implementations that allow users to create new locales, in addition to implementation-supplied ones. The ability to create new locales and categories, already available on many commercially available implementations of POSIX compliant systems, provides the means by which application providers can develop portable applications which use standard interfaces to adjust the behavior of the application to language and culture differences. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 582 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.36 logger - Log messages 4.36.1 Synopsis logger _s_t_r_i_n_g ... 4.36.2 Description The logger utility saves a message, in an unspecified manner and format, containing the _s_t_r_i_n_g operands provided by the user. The messages are expected to be evaluated later by personnel performing system administration tasks. 4.36.3 Options None. 4.36.4 Operands The following operands shall be supported by the implementation: _s_t_r_i_n_g One of the string arguments whose contents are concatenated together, in the order specified, separated by single s. 4.36.5 External Influences 4.36.5.1 Standard Input None. 4.36.5.2 Input Files None. 4.36.5.3 Environment Variables The following environment variables shall affect the execution of logger: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.36 logger - Log messages 583 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which diagnostic messages should be written. 4.36.5.4 Asynchronous Events Default. 4.36.6 External Effects 4.36.6.1 Standard Output None. 4.36.6.2 Standard Error Used only for diagnostic messages. 4.36.6.3 Output Files Unspecified. 4.36.7 Extended Description None. 4.36.8 Exit Status The logger utility shall exit with one of the following values: 0 Successful completion. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 584 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 >0 An error occurred. 4.36.9 Consequences of Errors Default. BEGIN_RATIONALE 4.36.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This utility allows logging of information for later use by a system administrator or programmer in determining why noninteractive utilities have failed. POSIX.2 makes no requirements for the locations of the saved message, their format, or retention period. It also provides no method for a portable application to read messages, once written. (It is expected that the POSIX.7 System Administration standard will have something to say about that.) The purpose of this utility might best be illustrated by an example. A batch application, running noninteractively, tries to read a configuration file and fails; it may attempt to notify the system administrator with: logger myname: unable to read file foo. [time stamp] The text with LC_MESSAGES about diagnostic messages means diagnostics from logger to the user or application, not diagnostic messages that the user is sending to the system administrator. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Multiple _s_t_r_i_n_g arguments were allowed, similar to echo, for ease of use. In Draft 9, the posixlog utility was renamed logger to match its BSD forebear, with which it is (downward) compatible. The working group believed strongly that some method of alerting administrators to errors was necessary. The obvious example is a batch utility, running noninteractively, that is unable to read its configuration files, or that is unable to create or write its results file. However, the working group did not wish to define the format or delivery mechanisms as they have historically been (and will probably continue to be) very system specific, as well as involving functionality clearly outside of the scope of this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.36 logger - Log messages 585 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Like the utilities mailx and lp, logger is admittedly difficult to test. This was not deemed sufficient justification to exclude these utilities from the standard. It is also arguable that they are, in fact, testable, but that the tests themselves are not portable. END_RATIONALE 4.37 logname - Return user's login name 4.37.1 Synopsis logname 4.37.2 Description The logname utility shall write the user's login name to standard output. The login name shall be the string that would be returned by the POSIX.1 {8} _g_e_t_l_o_g_i_n() function. Under the conditions where the _g_e_t_l_o_g_i_n() function would fail, the logname utility shall write a diagnostic message to standard error and exit with a nonzero exit status. 4.37.3 Options None. 4.37.4 Operands None. 4.37.5 External Influences 4.37.5.1 Standard Input None. 4.37.5.2 Input Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 586 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.37.5.3 Environment Variables The following environment variables shall affect the execution of logname: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.37.5.4 Asynchronous Events Default. 4.37.6 External Effects 4.37.6.1 Standard Output The logname utility output shall be a single line consisting of the user's login name: "%s\n", <_l_o_g_i_n _n_a_m_e> 4.37.6.2 Standard Error Used only for diagnostic messages. 4.37.6.3 Output Files None. 4.37.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.37 logname - Return user's login name 587 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.37.8 Exit Status The logname utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.37.9 Consequences of Errors Default. BEGIN_RATIONALE 4.37.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The logname utility explicitly ignores the LOGNAME environment variable because environment changes could produce erroneous results. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The passwd file is not listed as required, because the implementation may have other means of mapping login names. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 588 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.38 lp - Send files to a printer 4.38.1 Synopsis lp [-c] [-d _d_e_s_t] [-n _c_o_p_i_e_s] [_f_i_l_e ...] 4.38.2 Description The lp utility shall copy the input files to an output device in an unspecified manner. The default output destination should be to a hardcopy device, such as a printer or microfilm recorder, that produces nonvolatile, human-readable documents. If such a device is not available to the application, or if the system provides no such device, the lp utility shall exit with a nonzero exit status. The actual writing to the output device may occur some time after the lp utility successfully exits. During the portion of the writing that corresponds to each input file, the implementation shall guarantee exclusive access to the device. 4.38.3 Options The lp utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -c Exit only after further access to any of the input files is no longer required. The application can then safely delete or modify the files without affecting the output operation. -d _d_e_s_t Specify a string that names the output device or destination. If -d is not specified, and neither the LPDEST nor PRINTER environment variable is set, an unspecified output device is used. The -d _d_e_s_t option shall take precedence over LPDEST, which in turn shall take precedence over PRINTER. Results are undefined when _d_e_s_t contains a value that is not a valid device or destination name. -n _c_o_p_i_e_s Write _c_o_p_i_e_s number of copies of the files, where _c_o_p_i_e_s is a positive decimal integer. The methods for producing multiple copies and for arranging the multiple copies when multiple _f_i_l_e operands are used are unspecified, except that each file shall be output as an integral whole, not Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.38 lp - Send files to a printer 589 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX interleaved with portions of other files. 4.38.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a file to be output. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. If a _f_i_l_e operand is used, but the -c option is not specified, the process performing the writing to the output device may have user and group permissions that differ from that of the process invoking lp. 4.38.5 External Influences 4.38.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. 4.38.5.2 Input Files The input files shall be text files. 4.38.5.3 Environment Variables The following environment variables shall affect the execution of lp: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 590 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_MESSAGES This variable shall determine the language in which messages should be written. LPDEST This variable shall be interpreted as a string that names the output device or destination. If the LPDEST environment variable is not set, the PRINTER environment variable shall be used. The -d _d_e_s_t option shall take precedence over LPDEST. Results are undefined when -d is not specified and LPDEST contains a value that is not a valid device or destination name. PRINTER This variable shall be interpreted as a string that names the output device or destination. If the LPDEST and PRINTER environment variables are not set, an unspecified output device is used. The -d _d_e_s_t option and the LPDEST environment variable shall take precedence over PRINTER. Results are undefined when -d is not specified, LPDEST is unset, and PRINTER contains a value that is not a valid device or destination name. 4.38.5.4 Asynchronous Events Default. 4.38.6 External Effects 4.38.6.1 Standard Output A message concerning the identification or status of the print request 2 may be written, in an unspecified format. 2 4.38.6.2 Standard Error Used only for diagnostic messages. 4.38.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.38 lp - Send files to a printer 591 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.38.7 Extended Description None. 4.38.8 Exit Status The lp utility shall exit with one of the following values: 0 All input files were processed successfully. >0 No output device was available, or an error occurred. 4.38.9 Consequences of Errors Default. BEGIN_RATIONALE 4.38.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Since the default destination, device type, queueing mechanisms, and acceptable forms of input are all unspecified, usage guidelines for what a portable application can do are as follows: (1) Use the command in a pipeline, or with -c, so that there are no permission problems and the files can be safely deleted or modified. (2) Limit output to text files of reasonable line lengths and printable characters and include no device-specific formatting information, such as a page description language. The meaning of ``reasonable'' in this context can only be answered as a quality of implementation issue, but should be apparent from historical usage patterns in the industry and the locale. The pr and fold utilities can be used to achieve reasonable formatting for the implementation's default page size. Alternatively, the application can arrange its installation in such a way that requires the system administrator or operator to provide the appropriate information on lp options and environment variable values. At a minimum, having this utility in the standard tells the industry that portable applications require a means to print output and provides at least a command name and LPDEST routing mechanism that can be used for Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 592 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 discussions between vendors, application writers, and users. The use of ``should'' in the Description clearly shows the working group's intent, even if it cannot mandate that all systems (such as laptops) have printers. Examples: To print file _f_i_l_e: lp -c file To print multiple files with headers: pr file1 file2 | lp On most existing implementations of lp, an option is provided to pass printer specific options to the daemon handling the printer. It is not specified here because the printer-specific options are widespread and in conflict, the lp specified here is not required to even have a queueing mechanism, and the choice of options varies widely from printer to printer. Nonetheless, implementors are encouraged to use this mechanism where appropriate: -o _o_p_t_i_o_n Specifies an implementation-defined option that controls the specific operation of the printer. The following _o_p_t_i_o_ns could be used for the meanings below if the hardware is capable of supporting the option. _oooo_pppp_tttt_iiii_oooo_nnnn Meaning ______ ____________________________________ lp2 two logical pages per physical page lp4 four logical pages per physical page d double sided POSIX.2 does not specify what the ownership of the process performing the 1 writing to the output device may be. If -c is not used, it is 1 unspecified whether the process performing the writing to the output 1 device will have permission to read _f_i_l_e if there are any restrictions in 1 place on who may read _f_i_l_e until after it is printed. Also, if -c is not 1 used, the results of deleting _f_i_l_e before it is printed are unspecified. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The lp utility was designed to be a basic version of a utility that is already available in many historical implementations. The working group felt that it should be implementable simply as: cat "$@" > /dev/lp after appropriate processing of options, if that is how the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.38 lp - Send files to a printer 593 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX implementation chose to do it and if exclusive access could be granted (so that two users did not write to the device simultaneously). Although in the future the working group may add other options to this utility, it should always be able to execute with no options or operands and send the standard input to an unspecified output device. The standard makes no representations concerning the format of the printed output, except that it must be ``human-readable'' and ``nonvolatile.'' Thus, writing by default to a disk or tape drive or a display terminal would not qualify. (Such destinations are not prohibited when -d _d_e_s_t, LPDEST, or PRINTER are used, however.) A portable application will use one of the _f_i_l_e operands only with the -c option or if the file is publicly readable and guaranteed to be available at the time of printing. This is because the standard gives the implementation the freedom to queue up the request for printing at some later time by a different process that might not be able to access the file. The standard is worded such that a ``print job'' consisting of multiple input files, possibly in multiple copies, is guaranteed to print so that any one file is not jumbled up with another, but there is no statement that all the files or copies have to print out together. The -c option may imply a spooling operation, but this is not required. The utility can be implemented to simply wait until the printer is ready and then wait until it's finished. Because of that, there is no attempt to define a queueing mechanism (priorities, classes of output, etc.). The -n and -d options were added in response to balloting objections that too little historical value was being provided. Although the historical System V lp and BSD lpr utilities have provided similar functionality, they used different names for the environment variable specifying the destination printer. Since the name of the utility here is lp, LPDEST (used by the System V lp utility) was given precedence over PRINTER (used by the BSD lpr utility). Since environments of users frequently contain one or the other environment variable, the lp utility is required to recognize both. If this was not done, many applications would send output to unexpected output devices when users moved from system to system. Some have commented that lp has far too little functionality to make it worthwhile. Requests have proposed additional options or operands or both that added functionality. The requests included: - wording _r_e_q_u_i_r_i_n_g the output to be ``hardcopy'' Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 594 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 - a requirement for multiple printers - options for PostScript, dimpress, hp, and lineprint formats Given that a POSIX.2 compliant system is not required to even have a printer, placing further restrictions upon the behavior of the printer is not useful. Since hardcopy format is so application dependent, it is difficult, if not impossible, to select a reasonable subset of functionality that should be required on all POSIX.2 compliant systems. The term ``unspecified'' is used in this clause in lieu of ``implementation defined'' as most known implementations would not be able to say anything fully useful in their conformance documents: the existence and usage of printers is very dependent on how the system administrator configures each individual system. END_RATIONALE 4.39 ls - List directory contents 4.39.1 Synopsis ls [-CFRacdilqrtu1] [_f_i_l_e ...] 4.39.2 Description For each operand that names a file of a type other than directory, ls shall write the name of the file as well as any requested, associated information. For each operand that names a file of type directory, ls shall write the names of files contained within that directory, as well as any requested, associated information. If no operands are specified, the contents of the current directory shall be written. If more than one operand is specified, nondirectory operands shall be written first; directory and nondirectory operands shall be sorted separately according to the collating sequence in the current locale. 4.39.3 Options The ls utility shall conform to the utility argument syntax guidelines described in 2.10.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.39 ls - List directory contents 595 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following options shall be supported by the implementation: -C Write multi-text-column output with entries sorted down the columns, according to the collating sequence. The number of text columns and the column separator characters are unspecified, but should be adapted to the nature of the output device. -F Write a slash (/) immediately after each pathname that is a directory, an asterisk (*) after each that is executable, and a vertical bar (|) after each that is a FIFO. -R Recursively list subdirectories encountered. -a Write out all directory entries, including those whose names begin with a period (.). Entries beginning with a period (.) shall not be written out unless explicitly referenced, the -a option is supplied, or an implementation-defined condition causes them to be written. -c Use time of last modification of the file status information (see POSIX.1 {8} 5.6.1.3) instead of last modification of the file itself for sorting (-t) or writing (-l). -d Do not treat directories differently than other types of 2 files. The use of -d with -R produces unspecified 2 results. 2 -i For each file, write the file's file serial number (see POSIX.1 {8} 5.6.2). -l (The letter ell.) Write out in long format (see 4.39.6.1). When -l (ell) is specified, -1 (one) shall be 2 assumed. 2 -q Force each instance of nonprintable filename characters 2 and s to be written as the question-mark (?) 2 character. Implementations may provide this option by default if the output is to a terminal device. -r Reverse the order of the sort to get reverse collating sequence or oldest first. -t Sort by time modified (most recently modified first) before sorting the operands by the collating sequence. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 596 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -u Use time of last access (see POSIX.1 {8} 5.6.1.3) instead of last modification of the file for sorting (-t) or writing (-l). -1 (The numeric digit one.) Force output to be one entry per line. Specifying more than one of the options in the following mutually 2 exclusive pairs shall not be considered an error: -C and -l (ell), -C 2 and -1 (one), -c and -u. The last option specified in each pair shall 2 determine the output format. 2 4.39.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a file to be written. If the file specified is not found, a diagnostic message shall be output on standard error. 4.39.5 External Influences 4.39.5.1 Standard Input None. 4.39.5.2 Input Files None. 4.39.5.3 Environment Variables The following environment variables shall affect the execution of ls: COLUMNS This variable shall determine the user's preferred column position width for writing multiple-text- column output. If this variable contains a string representing a decimal integer, the ls utility shall calculate how many pathname text columns to write (see -C) based on the width provided. If COLUMNS is not set or invalid, an implementation- defined number of column positions shall be assumed, based on the implementation's knowledge of the output device. The column width chosen to write the names of files in any given directory shall be constant. File names shall not be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.39 ls - List directory contents 597 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX truncated to fit into the multiple-text-column output. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for character collation information in determining the pathname collation sequence. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and which characters are defined as printable (character class print). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_TIME This variable shall determine the the format and contents for date and time strings written by ls. TZ This variable shall determine the time zone for date and time strings written by ls. 4.39.5.4 Asynchronous Events Default. 4.39.6 External Effects 4.39.6.1 Standard Output The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when the -C option is specified. If the output is to a terminal, the format is implementation defined. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 598 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 If the -i option is specified, the file's file serial number (see POSIX.1 {8} 5.6.1) shall be written in the following format before any 2 other output for the corresponding entry: 2 "%u ", <_f_i_l_e _s_e_r_i_a_l _n_u_m_b_e_r> 2 If the -l option is specified, the following information shall be written: "%s %u %s %s %u %s %s\n", <_f_i_l_e _m_o_d_e>, <_n_u_m_b_e_r _o_f _l_i_n_k_s>, 1 <_o_w_n_e_r _n_a_m_e>, <_g_r_o_u_p _n_a_m_e>, <_n_u_m_b_e_r _o_f _b_y_t_e_s _i_n _t_h_e _f_i_l_e>, <_d_a_t_e _a_n_d _t_i_m_e>, <_p_a_t_h_n_a_m_e> If <_o_w_n_e_r _n_a_m_e> or <_g_r_o_u_p _n_a_m_e> cannot be determined, they shall be replaced with their associated numeric values using the format "%u". The <_d_a_t_e _a_n_d _t_i_m_e>, field shall contain the appropriate date and time stamp of when the file was last modified. In the POSIX Locale, the field shall be the equivalent of the output of the following date command (see 4.15): date "+%b %e %H:%M" if the file has been modified in the last six months, or: date "+%b %e %Y" (where two characters are used between %e and %Y) if the file has not been modified in the last six months or if the modification date is in the future, except that, in both cases, the final produced by date shall not be included and the output shall be as if the date command were executed at the time of the last modification date of the file rather than the current time. When the LC_TIME locale category is not set to the POSIX Locale, a different format and order of presentation of this field may be used. If the file is a character special or block special file, the size of the file may be replaced with implementation-defined information associated with the device in question. If the pathname was specified as a _f_i_l_e operand, it shall be written as specified. The file mode written under the -l option shall consist of the following format: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.39 ls - List directory contents 599 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX "%c%s%s%s%c", <_e_n_t_r_y _t_y_p_e>, <_o_w_n_e_r _p_e_r_m_i_s_s_i_o_n_s>, <_g_r_o_u_p _p_e_r_m_i_s_s_i_o_n_s>, <_o_t_h_e_r _p_e_r_m_i_s_s_i_o_n_s>, <_o_p_t_i_o_n_a_l _a_l_t_e_r_n_a_t_e _a_c_c_e_s_s _m_e_t_h_o_d _f_l_a_g> The <_o_p_t_i_o_n_a_l _a_l_t_e_r_n_a_t_e _a_c_c_e_s_s _m_e_t_h_o_d _f_l_a_g> shall be a single if there is no alternate or additional access control method associated with the file; otherwise, a printable character shall be used. The <_e_n_t_r_y _t_y_p_e> character shall describe the type of file, as follows: d Directory b Block special file c Character special file p FIFO - Regular file Implementations may add other characters to this list to represent other, implementation-defined, file types. The next three fields shall be three characters each: <_o_w_n_e_r _p_e_r_m_i_s_s_i_o_n_s> Permissions for the file owner class (see 2.9.1.3). <_g_r_o_u_p _p_e_r_m_i_s_s_i_o_n_s> Permissions for the file group class. <_o_t_h_e_r _p_e_r_m_i_s_s_i_o_n_s> Permissions for the file other class. Each field shall have three character positions: (1) If r, the file is readable; if -, it is not readable. (2) If w, the file is writable; if -, it is not writable. (3) The first of the following that applies: S If in <_o_w_n_e_r _p_e_r_m_i_s_s_i_o_n_s>, the file is not executable and set-user-ID mode is set. If in <_g_r_o_u_p _p_e_r_m_i_s_s_i_o_n_s>, the file is not executable and set- group-ID mode is set. s If in <_o_w_n_e_r _p_e_r_m_i_s_s_i_o_n_s>, the file is executable and set-user-ID mode is set. If in <_g_r_o_u_p _p_e_r_m_i_s_s_i_o_n_s>, the file is executable and set-group-ID mode is set. x The file is executable or the directory is searchable. - None of the attributes of S, s, or x applies. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 600 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Implementations may add other characters to this list for the third character position. Such additions shall, however, be written in lowercase if the file is executable or searchable, and in uppercase if it is not. If the -l option is specified, each list of files within the directory shall be preceded by a status line indicating the number of file system blocks occupied by files in the directory in 512-byte units, rounded up to the next integral number of units, if necessary. In the POSIX Locale, the format shall be: "total %u\n", <_n_u_m_b_e_r _o_f _u_n_i_t_s _i_n _t_h_e _d_i_r_e_c_t_o_r_y> If more than one directory, or a combination of nondirectory files and directories are written, either as a result of specifying multiple operands, or the -R option, each list of files within a directory shall be preceded by: "\n%s:\n", <_d_i_r_e_c_t_o_r_y _n_a_m_e> If this string is the first thing to be written, the first character shall not be written. This output shall precede the number of units in the directory. 4.39.6.2 Standard Error Used only for diagnostic messages. 4.39.6.3 Output Files None. 4.39.7 Extended Description None. 4.39.8 Exit Status The ls utility shall exit with one of the following values: 0 All files were written successfully. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.39 ls - List directory contents 601 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.39.9 Consequences of Errors Default. BEGIN_RATIONALE 4.39.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e An example of a small directory tree being fully listed with ls -laRF a in the POSIX Locale: total 11 drwxr-xr-x 3 hlj prog 64 Jul 4 12:07 ./ drwxrwxrwx 4 hlj prog 3264 Jul 4 12:09 ../ drwxr-xr-x 2 hlj prog 48 Jul 4 12:07 b/ -rwxr--r-- 1 hlj prog 572 Jul 4 12:07 foo* a/b: total 4 drwxr-xr-x 2 hlj prog 48 Jul 4 12:07 ./ drwxr-xr-x 3 hlj prog 64 Jul 4 12:07 ../ -rw-r--r-- 1 hlj prog 700 Jul 4 12:07 bar Many implementations use the equals-sign (=) and the at-sign (@) to denote sockets bound to the file system and symbolic links, respectively, for the -F option. Similarly, many historical implementations use the ``s'' character and the ``l'' character to denote sockets and symbolic links, respectively, as the entry type characters for the -l option. These characters should not be used to signify any other types of files in new implementations. It is difficult for an application to use every part of the file modes field of ls -l in a portable manner. Certain file types and executable bits are not guaranteed to be exactly as shown, as implementations may have extensions. Applications can use this field to pass directly to a user printout or prompt, but actions based on its contents should generally be deferred, instead, to the test utility (see 4.62). The output of ls (with the -l option) contains information that logically could be used by utilities such as chmod and touch to restore files to a known state. However, this information is presented in a format that cannot be used directly by those utilities or be easily translated into a format that can be used. In POSIX.2, a character was added to the end of the permissions string so that applications will at least have an indication that they may be working in an area they do not understand instead of assuming that they can translate the permissions string into Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 602 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 something that can be used. POSIX.6 may define one or more specific characters to be used based on different standard additional or alternative access control mechanisms. Some historical implementations of the ls utility show all entries in a directory except dot and dot-dot when super-user invokes ls without specifying the -a option. When ``normal'' users invoke ls without specifying -a, they should not see information about any files with names beginning with period unless they were named as file operands. As with many of the utilities that deal with file names, the output of ls 1 for multiple files or in one of the long listing formats must be used 1 carefully on systems where file names can contain embedded white space. 1 It is recommended that systems and system administrators institute 1 policies and user training to limit the use of such file names. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Implementations are expected to traverse arbitrary depths when processing the -R option. The only limitation on depth should be based on running out of physical storage for keeping track of untraversed directories. The -1 (one) option is currently found in BSD and BSD-derived implementations only. It was required in the standard so that portable applications might ensure that output is one entry per line, even if the output is to a terminal. Recent changes to the 2.10.2 allow numeric options. Generally, the standard is mute about what happens when options are given multiple times. In the case of -C, -l, and -1, however, it does specify the results of these overlapping options. Since ls is one of the most aliased commands, it is important that the implementation do the correct thing. For example, if the alias were alias ls="ls -C" and the user typed ``ls -1'', single text column output should result, not an error. (The working group is aware that aliases are not included in the standard; this is just an example.) The _S_V_I_D defines a -x option for multi-text-column output sorted horizontally. The working group felt that -x provided only limited increased functionality over the -C option. The _S_V_I_D also provides a -m option for a comma separated list of files. It was not provided because similar functionality (easier to parse for scripts) can be provided by the echo and printf utilities. Nonetheless, implementations considering adding new options to ls should look at historical BSD and System V versions of ls to avoid naming conflicts. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.39 ls - List directory contents 603 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The BSD ls provides a -A option (like -a, but dot and dot-dot are not written out). The small difference from -a did not seem important enough to require both. Implementations are allowed to make -q the default for terminals to prevent Trojan Horse attacks on terminals with special escape sequences. This is not required because: - Some control characters may be useful on some terminals; for example, a system might write them as \001 or ^A, - Special behavior for terminals is not relevant to application portability. The -s option provided by existing implementations is not required by this standard. The number of disk blocks occupied by the file that it reports varies depending on underlying file system type, block size units reported, and the method of calculating the number of blocks. On some file system types, the number is the actual number of blocks occupied by the file (counting indirect blocks and ignoring holes in the file); on others it is calculated based on the file size (usually making an allowance for indirect blocks, but ignoring holes). The former is probably more useful, but depends on information not required by POSIX.1 {8} and not readily accessible on some file system types. Therefore, applications cannot depend on -s to provide any portable information. Implementations are urged to continue to provide this option, but applications should use the file size reported by the -l option in any calculations about the space needed to store a file. An earlier draft specified that the optional alternate access method flag had to be ``+'' if there was an alternate access method used on the file or if there was not. This was changed in Draft 10 to be if there is not and a single printable character if there is. This was done for three reasons: 1) There are existing implementations using characters other than ``+''; 2) There are implementations that vary this character used in that position to distinguish between various alternate access methods in use, and; 3) the developers of the standard did not want to preclude specification by POSIX.6 that might need a way to specify more than one alternate access method. Nonetheless, implementations providing a single alternate access method are encouraged to use ``+''. In a previous draft the units used to specify the number of blocks occupied by files in a directory in an ls -l listing was implementation defined. This was because BSD systems have historically used 1024-byte units and System V systems have historically used 512-byte units. It was pointed out by developers at Berkeley that BSD has used 512-byte units in some places and 1024-byte units in other places. (System V has consistently used 512.) Therefore, POSIX.2 and POSIX.2a usually specify Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 604 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 512 and that value has been restored here as it was in Draft 9. Future releases of BSD are expected to consistently provide 512 as a default with a way of specifying 1024-byte units where appropriate. The <_d_a_t_e _a_n_d _t_i_m_e> field in the -l format is specified only for the POSIX Locale. As noted, the format can be different in other locales. No mechanism for defining this is present in this standard, as the appropriate vehicle is a messaging system; i.e., the format should be specified as a ``message.'' END_RATIONALE 4.40 mailx - Process messages 4.40.1 Synopsis mailx [-s _s_u_b_j_e_c_t] _a_d_d_r_e_s_s ... 4.40.2 Description The mailx utility shall read standard input and send it to one or more addresses in an unspecified manner. Unless the first character of one or more lines is tilde ( ), all characters in the input message shall appear in the delivered mess~age, but additional characters may be inserted in the message before it is retrieved. 4.40.3 Options The mailx utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -s _s_u_b_j_e_c_t A string representing the subject of the message. All 2 characters in the _s_u_b_j_e_c_t string shall appear in the 2 delivered message. The results are unspecified if _s_u_b_j_e_c_t 2 is longer than {LINE_MAX} - 10 bytes or contains a 2 . 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.40 mailx - Process messages 605 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.40.4 Operands The following operand shall be supported by the implementation: _a_d_d_r_e_s_s Send a message to _a_d_d_r_e_s_s. Valid login names on the local system shall be accepted as valid _a_d_d_r_e_s_ses. The interpretation of other types of _a_d_d_r_e_s_ses is unspecified. An implementation-defined way for a user with a login-name address to retrieve the message shall be provided by the implementation. 4.40.5 External Influences 4.40.5.1 Standard Input The standard input shall be a text file. The results are unspecified if the first character of any input line is a tilde (~). 4.40.5.2 Input Files None. 4.40.5.3 Environment Variables The following environment variables shall affect the execution of mailx: DEAD This variable shall affect the processing of signals by mailx: if the application sets this variable to /dev/null, the results of receiving a signal are as described by this standard; they are otherwise unspecified. HOME This variable shall be interpreted as a pathname of the user's home directory. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 606 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. MAILRC This variable shall affect the startup processing of mailx: if the application sets this variable to /dev/null, mailx shall operate as described by this standard; otherwise, unspecified results occur. 4.40.5.4 Asynchronous Events Default. 4.40.6 External Effects 4.40.6.1 Standard Output None. 4.40.6.2 Standard Error Used only for diagnostic messages. 4.40.6.3 Output Files None. 4.40.7 Extended Description None. 4.40.8 Exit Status The mailx utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.40 mailx - Process messages 607 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.40.9 Consequences of Errors Default. BEGIN_RATIONALE 4.40.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _U_s_a_g_e_,__E_x_a_m_p_l_e_s The intent is that a header indicating who sent the message and a message subject string, the contents of the standard input, and perhaps a trailer is delivered to users specified by the given addresses. The standard input, however, may have to be manipulated slightly to avoid confusion between message text and headers as it passes through the message delivery system. POSIX.2 does not specify how standard input may be manipulated; that will be specified in detail by POSIX.2a. The restriction on a subject line being {LINE_MAX} - 10 bytes is based on 2 the historical format that consumes 10 bytes for "Subject: " and the 2 trailing . Many historical mailers that a message may encounter 2 on other systems will not be able to handle lines that long, however. 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The developers of the standard felt strongly that a method for applications to send messages to specific users was necessary. The obvious example is a batch utility, running noninteractively, that wishes to communicate errors or results to a user. However, the actual format, delivery mechanism, and method of reading the message are clearly beyond the scope of this standard. The intent of this command is to provide a simple, portable interface for sending messages noninteractively. It merely defines a ``front-end'' to the historical mail system. It is suggested that implementations explicitly denote the sender and recipient in the body of the delivered message. Further specification of formats for either the message envelope or the message itself were deliberately not made, as the industry is in the midst of changing from the current standards to a more internationalized standard and it is probably incorrect, at this time, to require either one. Implementations are encouraged to conform to the various delivery mechanisms described in ARPANET Requests for Comment Numbers 819, 822, 882, 920, 921, and the CCITT X.400 standards. The standard does not place any restrictions on the length of messages handled by mailx, and for delivery of local messages the only limitations Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 608 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 should be the normal problems of available disk space for the target mail file. When sending messages to external machines, applications are advised to limit messages to less than 50 kilobytes because many mail gateways impose message-length restrictions. (Note that this is usually an administrative issue based on the amount of mail traffic and disk space available on the gateways. Therefore, there is no way for this standard to require implementations to guarantee delivery of long messages to remote systems.) Like the utilities logger and lp, mailx is admittedly difficult to test. This was not deemed sufficient justification to exclude these utilities from the standard. It is also arguable that they are, in fact, testable, but that the tests themselves are not portable. Before Draft 7, there was a utility named mailto. In Draft 7, the name was changed to sendto because of comments noting that mailto implied full mail-like functionality and that was not what the specification provided. However, there have been consistent comments that it does not make sense to end up with a standard that will require two mail-sending interfaces. (POSIX.2a is working on a fully fleshed-out mail-sending and -reading utility based on the historical System V mailx utility.) A message- (or mail-) sending utility that is a subset of the interactive utility that will be described by POSIX.2a is much more consistent with the rest of the standard. Therefore, in Draft 10 the name has been changed again to mailx and the description is a small subset of the functionality being specified by POSIX.2a. It provides a portable way for a shell script to be able to send a message to a user on the local system. It is expected that implementations that have provided mailx in the past will use it to meet the POSIX.2 requirements. Implementations that have not provided mailx in the past will be able to create a simple interface to their current mailer to meet these requirements. Most of the features provided by mailx (and the similar BSD Mail) utility are not specified here because they are not needed for noninteractive use (applications do not usually read mail without user participation) and they depend on other interactive features that are not defined by POSIX.2, but will be defined by POSIX.2a (the v command, for instance, uses the vi editor as a default.) ~ If the DEAD environment variable is not set to /dev/null, historical versions of mailx and Mail save a message being constructed in a file under some circumstances when some asynchronous events occur. The details will be specified by POSIX.2a. If the MAILRC environment variable does not name an empty file, historical versions of mailx and Mail read initialization commands from a file before processing begins. Since the initialization that a user specifies could alter the contents of messages an application is trying to send, applications are advised to set MAILRC to /dev/null. POSIX.2a Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.40 mailx - Process messages 609 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX will specify details on the format of the initialization file. Options to specify addresses as ``cc'' (carbon-copy) or ``bcc'' (blind- carbon-copy) were considered to be format details and were omitted. A zero exit status implies that all messages were _s_e_n_t, but it gives no assurances that any of them were actually _d_e_l_i_v_e_r_e_d. The reliability of the delivery mechanism is unspecified and is an appropriate marketing distinction between systems. END_RATIONALE 4.41 mkdir - Make directories 4.41.1 Synopsis mkdir [-p] [-m _m_o_d_e] _d_i_r ... 4.41.2 Description The mkdir utility shall create the directories specified by the operands, in the order specified. For each _d_i_r operand, the mkdir utility shall perform actions equivalent to the POSIX.1 {8} _m_k_d_i_r() function, called with the following arguments: (1) The _d_i_r operand is used as the _p_a_t_h argument. (2) The value of the bitwise inclusive OR of S_IRWXU, S_IRWXG, and S_IRWXO is used as the _m_o_d_e argument. (If the -m option is 1 specified, the _m_o_d_e option-argument overrides this default.) 1 4.41.3 Options The mkdir utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -m _m_o_d_e Set the file permission bits of the newly-created directory to the specified _m_o_d_e value. The _m_o_d_e option- argument shall be the same as the _m_o_d_e operand defined for the chmod utility (see 4.7). In the _s_y_m_b_o_l_i_c__m_o_d_e strings, the _o_p characters + and - shall be interpreted Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 610 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 relative to an assumed initial mode of a=rwx; + shall add permissions to the default mode, - shall delete permissions from the default mode. -p Create any missing intermediate pathname components. For each _d_i_r operand that does not name an existing directory, effects equivalent to those caused by following command shall occur: mkdir -p -m $(umask -S),u+wx $(dirname _d_i_r) && mkdir [-m _m_o_d_e] _d_i_r where the [-m _m_o_d_e] option represents that option supplied to the original invocation of mkdir, if any. Each _d_i_r operand that names an existing directory shall be ignored without error. 4.41.4 Operands The following operand shall be supported by the implementation: _d_i_r A pathname of a directory to be created. 4.41.5 External Influences 4.41.5.1 Standard Input None. 4.41.5.2 Input Files None. 4.41.5.3 Environment Variables The following environment variables shall affect the execution of mkdir: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.41 mkdir - Make directories 611 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.41.5.4 Asynchronous Events Default. 4.41.6 External Effects 4.41.6.1 Standard Output None. 4.41.6.2 Standard Error Used only for diagnostic messages. 4.41.6.3 Output Files None. 4.41.7 Extended Description None. 4.41.8 Exit Status The mkdir utility shall exit with one of the following values: 0 All the specified directories were created successfully or the -p option was specified and all the specified directories now exist. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 612 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 >0 An error occurred. 4.41.9 Consequences of Errors Default. BEGIN_RATIONALE 4.41.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The default file mode for directories is a=rwx (777) with selected permissions removed in accordance with the file mode creation mask. For intermediate path name components created by mkdir, the mode is the default modified by u+wx so that the subdirectories can always be created regardless of the file mode creation mask; if different ultimate permissions are desired for the intermediate directories, they can be changed afterward with chmod. Application writers should note that some of the requested directories may have been created even if an error occurs. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The System V -m option was added to control the file mode. The System V -p option was added to create any needed intermediate directories, to complement the functionality provided rmdir for removing directories in the path prefix as they become empty. Because no error is produced if any path component already exists, the -p option is also useful to ensure that a particular directory exists. The functionality of mkdir is described substantially through a reference to the _m_k_d_i_r() function in POSIX.1 {8}. For example, by default, the mode of the directory is affected by the file mode creation mask in accordance with the specified behavior of POSIX.1 {8} _m_k_d_i_r(). In this way, there is less duplication of effort required for describing details of the directory creation. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.41 mkdir - Make directories 613 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.42 mkfifo - Make FIFO special files 4.42.1 Synopsis mkfifo [-m _m_o_d_e] _f_i_l_e ... 4.42.2 Description The mkfifo utility shall create the FIFO special files specified by the operands, in the order specified. For each _f_i_l_e operand, the mkfifo utility shall perform actions equivalent to the POSIX.1 {8} _m_k_f_i_f_o() function, called with the following arguments: (1) The _f_i_l_e operand is used as the _p_a_t_h argument. (2) The value of the bitwise inclusive OR of S_IRUSR, S_IWUSR, S_IRGRP, S_IWGRP, S_IROTH, and S_IWOTH is used as the _m_o_d_e argument. (If the -m option is specified, the _m_o_d_e option- argument overrides this default.) 4.42.3 Options The mkfifo utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -m _m_o_d_e Set the file permission bits of the newly-created FIFO to the specified _m_o_d_e value. The _m_o_d_e option-argument shall be the same as the _m_o_d_e operand defined for the chmod utility (see 4.7). In the _s_y_m_b_o_l_i_c__m_o_d_e strings, the _o_p characters + and - shall be interpreted relative to an assumed initial mode of a=rw. 4.42.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of the FIFO special file to be created. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 614 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.42.5 External Influences 4.42.5.1 Standard Input None. 4.42.5.2 Input Files None. 4.42.5.3 Environment Variables The following environment variables shall affect the execution of mkfifo: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.42.5.4 Asynchronous Events Default. 4.42.6 External Effects 4.42.6.1 Standard Output None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.42 mkfifo - Make FIFO special files 615 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.42.6.2 Standard Error Used only for diagnostic messages. 4.42.6.3 Output Files None. 4.42.7 Extended Description None. 4.42.8 Exit Status The mkfifo utility shall exit with one of the following values: 0 All the specified FIFO special files were created successfully. >0 An error occurred. 4.42.9 Consequences of Errors Default. BEGIN_RATIONALE 4.42.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e None. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This new utility was added to permit shell applications to create FIFO special files. The -m option was added to control the file mode, for consistency with the similar functionality provided the mkdir utility. Earlier drafts included a -p option similar to mkdir's -p option that created intermediate directories leading up to the FIFO specified by the final component. This was removed because it is not commonly needed and is not common practice with similar utilities. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 616 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The functionality of mkfifo is described substantially through a reference to the _m_k_f_i_f_o() function in POSIX.1. For example, by default, the mode of the FIFO file is affected by the file mode creation mask in accordance with the specified behavior of POSIX.1 {8} _m_k_f_i_f_o(). In this way, there is less duplication of effort required for describing details of the file creation. END_RATIONALE 4.43 mv - Move files 4.43.1 Synopsis mv [-fi] _s_o_u_r_c_e__f_i_l_e _t_a_r_g_e_t__f_i_l_e mv [-fi] _s_o_u_r_c_e__f_i_l_e ... _t_a_r_g_e_t__d_i_r 4.43.2 Description In the first synopsis form, the mv utility shall move the file named by the _s_o_u_r_c_e__f_i_l_e operand to the _d_e_s_t_i_n_a_t_i_o_n specified by the _t_a_r_g_e_t__f_i_l_e. This first synopsis form is assumed when the final operand does not name an existing directory. In the second synopsis form, mv shall move each file named by a _s_o_u_r_c_e__f_i_l_e operand to a _d_e_s_t_i_n_a_t_i_o_n file in the existing directory named by the _t_a_r_g_e_t__d_i_r operand. The _d_e_s_t_i_n_a_t_i_o_n path for each _s_o_u_r_c_e__f_i_l_e shall be the concatenation of the target directory, a single slash character, and the last pathname component of the _s_o_u_r_c_e__f_i_l_e. If any operand specifies an existing file of a type not specified by POSIX.1 {8}, the behavior is implementation defined. This second form is assumed when the final operand names an existing directory. For each _s_o_u_r_c_e__f_i_l_e the following steps shall be taken: (1) If the destination path exists, the -f option is not specified, and either of the following conditions is true: (a) The permissions of the destination path do not permit writing and the standard input is a terminal. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.43 mv - Move files 617 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (b) The -i option is specified. the mv utility shall write a prompt to standard error and read a line from standard input. If the response is not affirmative, mv shall do nothing more with the current _s_o_u_r_c_e__f_i_l_e and go on to any remaining _s_o_u_r_c_e__f_i_l_es. (2) The mv utility shall perform actions equivalent to the POSIX.1 {8} _r_e_n_a_m_e() function, called with the following arguments: (a) The _s_o_u_r_c_e__f_i_l_e operand is used as the _o_l_d argument. (b) The destination path is used as the _n_e_w argument. If this succeeds, mv shall do nothing more with the current _s_o_u_r_c_e__f_i_l_e and go on to any remaining _s_o_u_r_c_e__f_i_l_es. If this fails for any reasons other than those described for the _e_r_r_n_o [EXDEV] in POSIX.1 {8}, mv shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_es. (3) If the destination path exists, and it is a file of type directory and _s_o_u_r_c_e__f_i_l_e is not a file of type directory, or it is a file not of type directory and _s_o_u_r_c_e__f_i_l_e is a file of type directory, mv shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_es. (4) If the destination path exists, mv shall attempt to remove it. If this fails for any reason, mv shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_es. (5) The file hierarchy rooted in _s_o_u_r_c_e__f_i_l_e shall be duplicated as a file hierarchy rooted in the destination path. The following characteristics of each file in the file hierarchy shall be duplicated: (a) The time of last data modification and time of last access. (b) The user ID and group ID. (c) The file mode. If the user ID, group ID, or file mode of a regular file cannot be duplicated, the file mode bits S_ISUID and S_ISGID shall not be duplicated. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 618 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 When files are duplicated to another file system, the 1 implementation may require that the process invoking mv have 1 read access to each file being duplicated. 1 If the duplication of the file hierarchy fails for any reason, mv shall write a diagnostic message to standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_es. If the duplication of the file characteristics fails for any reason, mv shall write a diagnostic message to standard error, but this failure shall not cause mv to modify its exit status. (6) The file hierarchy rooted in _s_o_u_r_c_e__f_i_l_e shall be removed. If this fails for any reason, mv shall write a diagnostic message to the standard error, do nothing more with the current _s_o_u_r_c_e__f_i_l_e, and go on to any remaining _s_o_u_r_c_e__f_i_l_es. 4.43.3 Options The mv utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -f Do not prompt for confirmation if the _d_e_s_t_i_n_a_t_i_o_n path exists. Any previous occurrences of the -i option shall be ignored. -i Prompt for confirmation if the destination path exists. Any previous occurrences of the -f option shall be ignored. Specifying more than one of the -f or -i options shall not be considered an error. The last option specified shall determine mv's behavior. 4.43.4 Operands The following operands shall be supported by the implementation: _s_o_u_r_c_e__f_i_l_e A pathname of a file or directory to be moved. _t_a_r_g_e_t__f_i_l_e A new pathname for the file or directory being moved. _t_a_r_g_e_t__d_i_r A pathname of an existing directory into which to move the input files. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.43 mv - Move files 619 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.43.5 External Influences 4.43.5.1 Standard Input Used to read an input line in response to each prompt specified in Standard Error. 4.43.6.2. Otherwise, the standard input shall not be used. 4.43.5.2 Input Files The input files specified by each _s_o_u_r_c_e__f_i_l_e operand can be of any file type. 4.43.5.3 Environment Variables The following environment variables shall affect the execution of mv: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes within regular expressions used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_MESSAGES This variable shall determine the processing of affirmative responses and the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 620 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.43.5.4 Asynchronous Events Default. 4.43.6 External Effects 4.43.6.1 Standard Output None. 4.43.6.2 Standard Error Prompts shall be written to the standard error under the conditions specified in 4.43.2. The prompts shall contain the _d_e_s_t_i_n_a_t_i_o_n pathname, but their format is otherwise unspecified. Otherwise, the standard error shall be used only for diagnostic messages. 4.43.6.3 Output Files The output files may be of any file type. 4.43.7 Extended Description None. 4.43.8 Exit Status The mv utility shall exit with one of the following values: 0 All input files were moved successfully. >0 An error occurred. 4.43.9 Consequences of Errors If the copying or removal of _s_o_u_r_c_e__f_i_l_e is prematurely terminated by a signal or error, mv may leave a partial copy of _s_o_u_r_c_e__f_i_l_e at the source or destination. The mv utility shall not modify both _s_o_u_r_c_e__f_i_l_e and the destination path simultaneously; termination at any point shall leave either _s_o_u_r_c_e__f_i_l_e or the destination path complete. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.43 mv - Move files 621 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.43.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If the current directory contains only files a (of any type defined by POSIX.1 {8}), b (also of any type), and a directory c: mv a b c mv c d will result with the original files a and b residing in the directory d in the current directory. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Previous versions of this draft diverged from _S_V_I_D and BSD historical practice in that they required that when the destination path exists, the -f option is not specified, and input is not a terminal, mv shall fail. This was done for compatibility with cp. This draft returns to historical practice. It should be noted that this is consistent with the POSIX.1 {8} function _r_e_n_a_m_e(), which does not require write permission on the target. For absolute clarity, paragraph (1), describing mv'_s behavior when prompting for confirmation, should be interpreted in the following manner: if (exists AND (NOT f_option) AND ((not_writable AND input_is_terminal) OR i_option)) The -i option exists on BSD systems, giving applications and users a way to avoid accidentally unlinking files when moving others. When the standard input is not a terminal, the 4.3BSD mv deletes all existing destination paths without prompting, even when -i is specified; this is inconsistent with the behavior of the 4.3BSD cp utility, which always generates an error when the file is unwritable and the standard input is not a terminal. The working group decided that use of -i is a request for interaction, so when the _d_e_s_t_i_n_a_t_i_o_n path exists, the utility takes instructions from whatever responds to standard input. The _r_e_n_a_m_e() function is able to move directories within the same file system. Some historical versions of mv have been able to move 1 directories, but not to a different file system. The working group felt that this was an annoying inconsistency, so the standard requires directories to be movable even across file systems. There is no -R option to confirm that moving a directory is actually intended, since such an option was not required for moving directories in historical practice. Requiring the application to specify it sometimes, depending on the destination, seemed just as inconsistent. The semantics of the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 622 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _r_e_n_a_m_e() function were preserved as much as possible. For example, mv is not permitted to ``rename'' files to or from directories, even though they might be empty and removable. Historic implementations of mv did not exit with a nonzero exit status if they were unable to duplicate any file characteristics when moving a file across file systems, nor did they write a diagnostic message for the user. The former behavior has been preserved to prevent scripts from breaking; a diagnostic message is now required, however, so that users are alerted that the file characteristics have changed. The exact format of the interactive prompts is unspecified. Only the general nature of the contents of prompts are specified, because implementations may desire more descriptive prompts than those used on historical implementations. Therefore, an application not using the -f option or using the -i option relies on the system to provide the most suitable dialogue directly with the user, based on the behavior specified. END_RATIONALE 4.44 nohup - Invoke a utility immune to hangups 4.44.1 Synopsis nohup _u_t_i_l_i_t_y [_a_r_g_u_m_e_n_t ...] 4.44.2 Description The nohup utility shall invoke the utility named by the _u_t_i_l_i_t_y operand with arguments supplied as the _a_r_g_u_m_e_n_t operands. At the time the named _u_t_i_l_i_t_y is invoked, the SIGHUP signal shall be set to be ignored. If the standard output is a terminal, all output written by the named _u_t_i_l_i_t_y to its standard output shall be appended to the end of the file nohup.out in the current directory. If nohup.out cannot be created or opened for appending, the output shall be appended to the end of the file nohup.out in the directory specified by the HOME environment variable. If neither file can be created or opened for appending, _u_t_i_l_i_t_y shall not be invoked. If a file is created, the file's permission bits shall be set to S_IRUSR | S_IWUSR instead of the default specified in 2.9.1.4. If the standard error is a terminal, all output written by the named _u_t_i_l_i_t_y to its standard error shall be redirected to the same file descriptor as the standard output. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.44 nohup - Invoke a utility immune to hangups 623 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.44.3 Options None. 4.44.4 Operands The following operands shall be supported by the implementation: _u_t_i_l_i_t_y The name of a utility that is to be invoked. If the _u_t_i_l_i_t_y operand names any of the special built-in utilities in 3.14, the results are undefined. _a_r_g_u_m_e_n_t Any string to be supplied as an argument when invoking the utility named by the _u_t_i_l_i_t_y operand. 4.44.5 External Influences 4.44.5.1 Standard Input None. 4.44.5.2 Input Files None. 4.44.5.3 Environment Variables The following environment variables shall affect the execution of nohup: HOME This variable shall determine the pathname of the user's home directory: if the output file nohup.out cannot be created in the current directory, the nohup utility shall use the directory named by HOME to create the file. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 624 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. PATH This variable shall determine the search path that shall be used to locate the utility to be invoked. See 2.6. 4.44.5.4 Asynchronous Events The nohup utility shall take the standard action for all signals (see 2.11.5.4), except that SIGHUP shall be ignored. 4.44.6 External Effects 4.44.6.1 Standard Output If the standard output is not a terminal, the standard output of nohup shall be the standard output generated by the execution of the _u_t_i_l_i_t_y specified by the operands. Otherwise, nothing shall be written to the standard output. 4.44.6.2 Standard Error If the standard output is a terminal, a message shall be written to the standard error, indicating the name of the file to which the output is being appended. The name of the file shall be either nohup.out or $HOME/nohup.out. 4.44.6.3 Output Files If the standard output is a terminal, all output written by the named _u_t_i_l_i_t_y to the standard output and standard error is appended to the file nohup.out, which is created if it does not already exist. 4.44.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.44 nohup - Invoke a utility immune to hangups 625 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.44.8 Exit Status The nohup utility shall exit with one of the following values: 126 The utility specified by _u_t_i_l_i_t_y was found but could not be 1 invoked. 1 127 An error occurred in the nohup utility or the utility specified 1 by _u_t_i_l_i_t_y could not be found. 1 Otherwise, the exit status of nohup shall be that of the utility specified by the _u_t_i_l_i_t_y operand. 4.44.9 Consequences of Errors Default. BEGIN_RATIONALE 4.44.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e It is frequently desirable to apply nohup to pipelines or lists of commands. This can be done by placing pipelines and command lists in a single file; this file can then be invoked as a utility, and the nohup applies to everything in the file. Alternatively, the following command can be used to apply nohup to a complex command: nohup sh -c '_c_o_m_p_l_e_x-_c_o_m_m_a_n_d-_l_i_n_e' The 4.3BSD version ignores SIGTERM and SIGHUP, and if ./nohup.out cannot be used, it fails instead of trying to use $HOME/nohup.out. The command, env, nohup, and xargs utilities have been specified to use exit code 127 if an error occurs so that applications can distinguish 1 ``failure to find a utility'' from ``invoked utility exited with an error 1 indication.'' The value 127 was chosen because it is not commonly used 1 for other meanings; most utilities use small values for ``normal error conditions'' and the values above 128 can be confused with termination due to receipt of a signal. The value 126 was chosen in a similar manner 1 to indicate that the utility could be found, but not invoked. Some 1 scripts produce meaningful error messages differentiating the 126 and 127 1 cases. The distinction between exit codes 126 and 127 is based on 2 KornShell practice that uses 127 when all attempts to _e_x_e_c the utility 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 626 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 fail with [ENOENT], and uses 126 when any attempt to _e_x_e_c the utility 2 fails for any other reason. 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The csh utility has a built-in version of nohup that acts differently than this. The term _u_t_i_l_i_t_y is used, rather than _c_o_m_m_a_n_d, to highlight the fact that shell compound commands, pipelines, special built-ins, etc., cannot be used directly. However, _u_t_i_l_i_t_y includes user application programs and shell scripts, not just the standard utilities. Historical versions of the nohup utility use default file creation semantics. Some more recent versions use the permissions specified here as an added security precaution. Some historical implementations ignore SIGQUIT in addition to SIGHUP; others ignore SIGTERM. An earlier draft allowed, but did not require, SIGQUIT to be ignored. Several members of the balloting group objected, saying that nohup should only modify the handling of SIGHUP as required by this specification. END_RATIONALE 4.45 od - Dump files in various formats 4.45.1 Synopsis od [-v] [-A _a_d_d_r_e_s_s__b_a_s_e] [-j _s_k_i_p] [-N _c_o_u_n_t] [-t _t_y_p_e__s_t_r_i_n_g] ... [_f_i_l_e ...] 4.45.2 Description The od utility shall write the contents of its input files to standard output in a user-specified format. 4.45.3 Options The od utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that the order of presentation of the -t options is significant. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.45 od - Dump files in various formats 627 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following options shall be supported by the implementation: -A _a_d_d_r_e_s_s__b_a_s_e Specify the input offset base (see 4.45.7). The _a_d_d_r_e_s_s__b_a_s_e option argument shall be a character. The characters d, o, and x shall specify that the offset base shall be written in decimal, octal, or hexadecimal, respectively. The character n shall specify that the offset shall not be written. -j _s_k_i_p Jump over _s_k_i_p bytes from the beginning of the input. The od utility shall read or seek past the first _s_k_i_p bytes in the concatenated input files. If the combined input is not at least _s_k_i_p bytes long, the od utility shall write a diagnostic message to standard error and exit with a nonzero exit status. By default, the _s_k_i_p option-argument shall be interpreted as a decimal number. With a leading 0x or 0X, the offset shall be interpreted as a hexadecimal number; otherwise, with a leading 0, the offset shall be interpreted as an octal number. Appending the character b, k, or m to offset shall cause it to be interpreted as a multiple of 512, 1024, or 1048576 bytes, respectively. -N _c_o_u_n_t Format no more than _c_o_u_n_t bytes of input. By default, _c_o_u_n_t shall be interpreted as a decimal number. With a leading 0x or 0X, _c_o_u_n_t shall be interpreted as a hexadecimal number; otherwise, with a leading 0, it shall be interpreted as an octal number. If _c_o_u_n_t bytes of input (after successfully skipping, if -j _s_k_i_p is specified) are not available, it shall not be considered an error; the od utility shall format the input that is available. -t _t_y_p_e__s_t_r_i_n_g Specify one or more output types (see 4.45.7). The _t_y_p_e__s_t_r_i_n_g option-argument shall be a string specifying the types to be used when writing the input data. The string shall consist of the type specification characters a, c, d, f, o, u, and x, specifying named character, character, signed decimal, floating point, octal, unsigned decimal, and hexadecimal, respectively. The type specification characters d, f, o, u, and x can be followed by an optional unsigned decimal integer that specifies the number of bytes to be transformed by each instance of the output type. The type specification character f can be followed by an optional F, D, or L indicating that the conversion should be applied to an item of type _f_l_o_a_t, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 628 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _d_o_u_b_l_e, or _l_o_n_g _d_o_u_b_l_e, respectively. The type specification characters d, o, u, and x can be followed by an optional C, S, I, or L indicating that the conversion should be applied to an item of type _c_h_a_r, _s_h_o_r_t, _i_n_t, or _l_o_n_g, respectively. Multiple types can be concatenated within the same _t_y_p_e__s_t_r_i_n_g and multiple -t options can be specified. Output lines shall be written for each type specified in the order in which the type specification characters are specified. -v Write all input data. Without the -v option, any number of groups of output lines, which would be identical to the immediately preceding group of output lines (except for the byte offsets), shall be replaced with a line containing only an asterisk (*). 4.45.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a file to be written. If no file operands are specified, the standard input shall be used. The results are unspecified if the first character of _f_i_l_e is a plus-sign (+) or the first character of the first file operand is numeric, unless at least one of the -A, -j, -N, or -t options is specified. 4.45.5 External Influences 4.45.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.45.5.2 Input Files The input files can be any file type. 4.45.5.3 Environment Variables The following environment variables shall affect the execution of od: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.45 od - Dump files in various formats 629 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_NUMERIC This variable shall determine the locale for selecting the radix character used when writing floating-point formatted output. 4.45.5.4 Asynchronous Events Default. 4.45.6 External Effects 4.45.6.1 Standard Output See 4.45.7. 4.45.6.2 Standard Error Used only for diagnostic messages. 2 4.45.6.3 Output Files None. 4.45.7 Extended Description The od utility shall copy sequentially each input file to standard output, transforming the input data according to the output types specified by the -t option(s). If no output type is specified, the default output shall be as if -t o2 had been specified. The number of bytes transformed by the output type specifier c may be variable depending on the LC_CTYPE category. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 630 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The default number of bytes transformed by output type specifiers d, f, o, u, and x shall correspond to the various C-language types as follows. 1 If the c89 compiler is present on the system, these specifiers shall 1 correspond to the sizes used by default in that compiler. Otherwise, 1 these sizes are implementation defined. 1 - For the type specifier characters d, o, u, and x, the default number of bytes shall correspond to the size of the underlying implementation's basic integral data type. For these specifier characters, the implementation shall support values of the optional number of bytes to be converted corresponding to the number of bytes in the C-language types _c_h_a_r, _s_h_o_r_t, _i_n_t, and _l_o_n_g. These numbers can also be specified by an application as the characters C, S, I, and L, respectively. The byte order used when interpreting numeric values is implementation defined, but shall correspond to the order in which a constant of the corresponding type is stored in memory on the system. - For the type specifier character f, the default number of bytes shall correspond to the number of bytes in the underlying implementation's basic double precision floating point data type. The implementation shall support values of the optional number of bytes to be converted corresponding to the number of bytes in the C-language types _f_l_o_a_t, _d_o_u_b_l_e, and _l_o_n_g _d_o_u_b_l_e. These numbers can also be specified by an application as the characters F, D, and L, respectively. The type specifier character a specifies that bytes shall be interpreted as named characters from the International Reference Version (IRV) of ISO/IEC 646 {1}. Only the least significant seven bits of each byte shall be used for this type specification. Bytes with the values listed in Table 4-8 shall be written using the corresponding names for those characters. The type specifier character c specifies that bytes shall be interpreted as characters specified by the current setting of the LC_CTYPE locale category. Characters listed in Table 2-15 (see 2.12) shall be written as the corresponding escape sequences, except that backslash shall be written as a single backslash and a NUL shall be written as \0. Other nonprintable characters shall be written as one three-digit octal number for each byte in the character. If the size of a byte on the system is 1 greater than nine bits, the format used for nonprintable characters is 1 implementation-defined. Printable multibyte characters shall be written 1 in the area corresponding to the first byte of the character; the two- character sequence ** shall be written in the area corresponding to each remaining byte in the character, as an indication that the character is continued. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.45 od - Dump files in various formats 631 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table 4-8 - od Named Characters __________________________________________________________________________________________________________________________________________________ Value Name Value Name Value Name Value Name _____ ____ _____ ____ _____ _________ _____ ____ \000 nul \001 soh \002 stx \003 etx \004 eot \005 enq \006 ack * \007 bel \010 bs \011 ht \012 lf or nl \013 vt \014 ff \015 cr \016 so \017 si \020 dle \021 dc1 \022 dc2 \023 dc3 \024 dc4 \025 nak \026 syn \027 etb \030 can \031 em \032 sub \033 esc \034 fs \035 gs \036 rs \037 us \040 sp \177 del __________________________________________________________________________________________________________________________________________________ NOTE: The \012 value may be written either as lf or nl. The input data shall be manipulated in blocks, where a block is defined as a multiple of the least common multiple of the number of bytes transformed by the specified output types. If the least common multiple is greater than 16, the results are unspecified. Each input block shall be written as transformed by each output type, one per written line, in the order that the output types were specified. If the input block size is larger than the number of bytes transformed by the output type, the output type shall sequentially transform the parts of the input block and the output from each of the transformations shall be separated by one or more s. If, as a result of the specification of the -N option or end-of-file being reached on the last input file, input data only partially satisfies an output type, the input shall be extended sufficiently with null bytes to write the last byte of the input. Unless -A n is specified, the first output line produced for each input block shall be preceded by the input offset, cumulative across input files, of the next byte to be written. The format of the input offset is unspecified; however, it shall not contain any s, shall start at the first character of the output line, and shall be followed by one or more s. In addition, the offset of the byte following the last byte written shall be written after all the input data has been processed, but shall not be followed by any s. If no -A option is specified, the input offset base is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 632 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.45.8 Exit Status The od utility shall exit with one of the following values: 0 All input files were processed successfully. >0 An error occurred. 4.45.9 Consequences of Errors Default. BEGIN_RATIONALE 4.45.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If a file containing 128 bytes with decimal values zero through 127, in increasing order, is supplied as standard input to the command: od -A d -t a on an implementation using an input block size of 16 bytes, the standard output, independent of the current locale setting, would be similar to: 0000000 nul soh stx etx eot enq ack bel bs ht nl vt ff cr so si 0000016 dle dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us 0000032 sp ! " # $ % & ' ( ) * + , - . / 0000048 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 0000064 @ A B C D E F G H I J K L M N O 0000080 P Q R S T U V W X Y Z [ \ ] ^ _ 0000096 ` a b c d e f g h i j k l m n o 0000112 p q r s t u v w x y z { | } del 0000128 ~ Note that this standard allows nl or lf to be used as the name for the ISO/IEC 646 {1} IRV character with decimal value 10. The IRV names this character lf (line feed), but traditional implementations on which POSIX.2 are based have referred to this character as newline (nl) and the POSIX Locale character set symbolic name for the corresponding character is . The command: od -A o -t o2x2x -n 18 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.45 od - Dump files in various formats 633 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX on a system with 32-bit words and an implementation using an input block size of 16 bytes could write 18 bytes in approximately the following format: 0000000 032056 031440 041123 042040 052516 044530 020043 031464 342e 3320 4253 4420 554e 4958 2023 3334 342e3320 42534420 554e4958 20233334 0000020 032472 353a 353a0000 0000022 The command: od -A d -t f -t o4 -t x4 -n 24 -j 0x15 on a system with 64-bit doubles (for example, the IEEE Std 754 double precision floating point format) would skip 21 bytes of input data and then write 24 bytes in approximately the following format: 0000000 1.00000000000000e+00 1.57350000000000e+01 07774000000 00000000000 10013674121 35341217270 3ff00000 00000000 402f7851 eb851eb8 0000016 1.40668230000000e+02 10030312542 04370303230 40619562 23e18698 0000024 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The od utility has gone through several names in previous drafts, including hd, xd, and most recently hexdump. There were several objections to all of these based on the following reasons: - The hd and xd names conflicted with existing utilities that behaved differently. - The hexdump description was much more complex than needed for a simple dump utility. - The od utility has been available on all traditional implementations and there was no need to create a new name for a utility so similar to the existing od utility. The original reasons for not standardizing historical od were also fairly widespread. Those reasons are given below along with rationale explaining why the developers of this standard believe that this version does not suffer from the indicated problem: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 634 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 - The BSD and System V versions of od have diverged and the intersection of features provided by both does not meet the needs of the user community. In fact, the System V version only provides a mechanism for dumping octal bytes and _s_h_o_r_ts, signed and unsigned decimal _s_h_o_r_ts, hexadecimal _s_h_o_r_ts, and ASCII characters. BSD added the ability to dump _f_l_o_a_ts, _d_o_u_b_l_es, named ASCII characters, and octal, signed decimal, unsigned decimal, and hexadecimal _l_o_n_gs. The version presented here provides more normalized forms for dumping bytes, _s_h_o_r_ts, _i_n_ts, and _l_o_n_gs in octal, signed decimal, unsigned decimal, and hexadecimal; _f_l_o_a_t, _d_o_u_b_l_e, and _l_o_n_g _d_o_u_b_l_e; and named ASCII as well as current locale characters. - It would not be possible to come up with a compatible superset of the BSD and System V flags that met the requirements of this standard. The historical default od output is the specified default output of this utility. None of the option letters chosen for this version of od conflict with any of the options to historical versions of od. - On systems with different sizes for _s_h_o_r_t, _i_n_t, and _l_o_n_g, there was no way to ask for dumps of _i_n_ts, even in the BSD version. The way options are named, there is no easy way to extend the namespace for these problems. This is why the -t option was added with type specifiers more closely matched to the _p_r_i_n_t_f() formats used in the rest of this standard and the optional field sizes were added to the d, f, o, u, and x type specifiers. It is also one of the reasons why the historical practice was not mandated as a required obsolescent form of od. (Although the old versions of od are not listed as an obsolescent form, implementations are urged to continue to recognize the old forms they have recognized for a few years.) The a, c, f, o, and x types match the meaning of the corresponding format characters in the historical implementations of od except for the default sizes of the fields converted. The d format is signed in this specification to match the _p_r_i_n_t_f() notation. (Historical versions of od used d as a synonym for u in this version. The System V implementation uses s for signed decimal; BSD uses i for signed decimal and s for null terminated strings.) Other than d and u, all of the type specifiers match format characters in the historical BSD version of od. The sizes of the C-language types _c_h_a_r, _s_h_o_r_t, _i_n_t, _l_o_n_g, _f_l_o_a_t, _d_o_u_b_l_e, and _l_o_n_g _d_o_u_b_l_e are used even though it is recognized that there may be zero or more than one compiler for the C language on an implementation and that they may use different sizes for some of these types. [For example, one compiler might use 2-byte _s_h_o_r_t_s, 2-byte _i_n_t_s, and 4-byte _l_o_n_g_s while another compiler (or an option to the same compiler) uses 2-byte _s_h_o_r_t_s, 4-byte _i_n_t_s, and 4-byte _l_o_n_g_s.] Nonetheless, there has to be a basic size known by the implementation for these types, corresponding to the values Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.45 od - Dump files in various formats 635 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX reported by invocations of the getconf utility (see 4.26) when called with _s_y_s_t_e_m__v_a_r operands UCHAR_MAX, USHORT_MAX, UINT_MAX, and ULONG_MAX for the types _c_h_a_r, _s_h_o_r_t, _i_n_t, and _l_o_n_g, respectively. There are similar constants required by the C Standard {7}, but not required by POSIX.1 {8} or POSIX.2. They are FLT_MANT_DIG, DBL_MANT_DIG, and LDBL_MANT_DIG for the types _f_l_o_a_t, _d_o_u_b_l_e, and _l_o_n_g _d_o_u_b_l_e, respectively. If the optional c89 utility (see A.1) is provided by the implementation and used as specified by this standard, these are the sizes that would be provided. If an option is used that specifies different sizes for these types, there is no guarantee that the od utility will be able to correctly interpret binary data output by such a program. POSIX.2 requires that the numeric values of these lengths be recognized by the od utility and that symbolic forms also be recognized. Thus a portable application can always look at an array of _u_n_s_i_g_n_e_d _l_o_n_g data elements using od -t uL. - The method of specifying the format for the address field based on specifying a starting offset in a file unnecessarily tied the two together. The -A option now specifies the address base and the -S option specifies a starting offset. Applications are warned not to use filenames starting with + or a first operand starting with a numeric character so that the old functionality can be maintained by implementations, unless they specify one of the new options specified by POSIX.2. To guarantee that one of these filenames will always be interpreted as a file name, an application could always specify the address base format with the -A option. - It would be hard to break the dependence on US ASCII to get an internationalized utility. It does not seem to be any harder for od to dump characters in the current locale than it is for the ed or sed l commands. The c type specifier does this with no problem and is completely compatible with the historical implementations of the c format character when the current locale uses a superset of ISO/IEC 646 {1} as a code set. The a type specifier (from the BSD a format character) was left as a portable means to dump ASCII [or more correctly ISO/IEC 646 {1} (IRV)] so that headers produced by pax could be deciphered even on systems that do not use ISO/IEC 646 {1} as a subset of their base code set. The use of ** as an indication of continuation of a multibyte character in c specifier output was chosen based on seeing an implementation that uses this method. The continuation bytes have to be marked in a way that will not be ambiguous with another single- or multibyte character. An earlier draft used -S and -n, respectively, for the -j and -N options in this draft. These were changed to avoid conflicts with historical implementations. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 636 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.46 paste - Merge corresponding or subsequent lines of files 4.46.1 Synopsis paste [-s] [-d _l_i_s_t] _f_i_l_e ... 4.46.2 Description The paste utility shall concatenate the corresponding lines of the given input files, and write the resulting lines to standard output. The default operation of paste shall concatenate the corresponding lines of the input files. The character of every line except the line from the last input file shall be replaced with a character. If an end-of-file condition is detected on one or more input files, but not all input files, paste shall behave as though empty lines were read from the file(s) on which end-of-file was detected, unless the -s option is specified. 4.46.3 Options The paste utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -d _l_i_s_t Unless a backslash character appears in _l_i_s_t, each 2 character in _l_i_s_t is an element specifying a delimiter 2 character. If a backslash character appears in _l_i_s_t, the 2 backslash character and one or more characters following 2 it are an element specifying a delimiter character as 2 described below. These elements specify one or more 2 delimiters to use, instead of the default , to 2 replace the character of the input lines. The 2 elements in _l_i_s_t shall be used circularly; i.e., when the 2 list is exhausted the first element from the list shall be 2 re-used. When the -s option is specified: - The last character in a file shall not be modified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.46 paste - Merge corresponding or subsequent lines of files 637 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX - The delimiter shall be reset to the first element of list after each _f_i_l_e operand is processed. When the -s option is not specified: - The characters in the file specified by the last _f_i_l_e operand shall not be modified. - The delimiter shall be reset to the first element of list each time a line is processed from each file. If a backslash character appears in _l_i_s_t, it and the character following it shall be used to represent the following delimiter characters: \n character \t character \\ backslash character \0 Empty string (not a null character). If \0 is immediately followed by the character x, the character X, or any character defined by the LC_CTYPE digit keyword (see 2.5.2.1), the results are unspecified. If any other characters follow the backslash, the results are unspecified. -s Concatenate all of the lines of each separate input file in command line order. The character of every line except the last line in each input file shall be replaced with the character, unless otherwise specified by the -d option. 4.46.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If - is specified for one or more of the _f_i_l_es, the standard input shall be used; the standard input shall be read one line at a time, circularly, for each instance of -. Implementations shall support pasting of at least 12 _f_i_l_e operands. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 638 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.46.5 External Influences 4.46.5.1 Standard Input The standard input shall be used only if one or more _f_i_l_e operands is -. See Input Files. 4.46.5.2 Input Files The input files shall be text files, except that line lengths shall be unlimited. 4.46.5.3 Environment Variables The following environment variables shall affect the execution of paste: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.46.5.4 Asynchronous Events Default. 4.46.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.46 paste - Merge corresponding or subsequent lines of files 639 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.46.6.1 Standard Output Concatenated lines of input files shall be separated by the character (or other characters under the control of the -d option) and terminated by a character. 4.46.6.2 Standard Error Used only for diagnostic messages. 4.46.6.3 Output Files None. 4.46.7 Extended Description None. 4.46.8 Exit Status The paste utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.46.9 Consequences of Errors If one or more input files cannot be opened when the -s option is not specified, a diagnostic message shall be written to standard error, but no output shall be written to standard output. If the -s option is specified, the paste utility shall provide the default behavior described in 2.11.9. BEGIN_RATIONALE 4.46.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e When the escape sequences of the _l_i_s_t option-argument are used in a shell script, they must be quoted; otherwise, the shell treats the \ as a special character. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 640 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Write out a directory in four columns: ls | paste - - - - Combine pairs of lines from a file into single lines: paste -s -d "\t\n" file Portable applications should only use the specific backslash escaped delimiters presented in this standard. Historical implementations treat \x, where x is not in this list, as x, but future implementations are free to expand this list to recognize other common escapes similar to those accepted by printf and other standard utilities. Most of the standard utilities work on text files. The cut utility can be used to turn files with arbitrary line lengths into a set of text files containing the same data. The paste utility can be used to create (or recreate) files with arbitrary line lengths. For example, if file contains long lines: cut -b 1-500 -n file > file1 cut -b 501- -n file > file2 creates file1 (a text file) with lines no longer than 500 bytes (plus the character) and file2 that contains the remainder of the data from file. (Note that file2 will not be a text file if there are lines in file that are longer than 500 + {LINE_MAX} bytes.) The original file can be recreated from file1 and file2 using the command: paste -d "\0" file1 file2 > file The commands 2 paste -d "\0" ... 2 paste -d "" ... 2 are not necessarily equivalent; the latter is not specified by POSIX.2 2 and may result in an error. The construct \0 is used to mean ``no 2 separator'' because historical versions of paste did not follow the 2 syntax guidelines and the command 2 paste -d"" ... 2 could not be handled properly by _g_e_t_o_p_t(). 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Because most of the standards utilities work on text files, cut and paste are required to process lines of arbitrary length as a means of Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.46 paste - Merge corresponding or subsequent lines of files 641 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX converting long lines from arbitrary sources into text files and converting processed text files back into files with arbitrary line lengths to interface with those applications that require long lines as input. END_RATIONALE 4.47 pathchk - Check pathnames 4.47.1 Synopsis pathchk [-p] _p_a_t_h_n_a_m_e ... 4.47.2 Description The pathchk utility shall check that one or more pathnames are valid (i.e., they could be used to access or create a file without causing syntax errors) and portable (i.e., no filename truncation will result). More extensive portability checks are provided by the -p option. By default, the pathchk utility shall check each component of each _p_a_t_h_n_a_m_e operand based on the underlying file system. A diagnostic shall be written for each _p_a_t_h_n_a_m_e operand that: - is longer than {PATH_MAX} bytes (see Pathname Variable Values in POSIX.1 {8} 2.9.5), - contains any component longer than {NAME_MAX} bytes in its containing directory, - contains any component in a directory that is not searchable, or - contains any character in any component that is not valid in its containing directory. The format of the diagnostic message is not specified, but shall indicate the error detected and the corresponding _p_a_t_h_n_a_m_e operand. It shall not be considered an error if one or more components of a _p_a_t_h_n_a_m_e operand do not exist as long as a file matching the pathname specified by the missing components could be created that does not violate any of the checks specified above. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 642 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.47.3 Options The pathchk utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -p Instead of performing checks based on the underlying file system, write a diagnostic for each _p_a_t_h_n_a_m_e operand that: - is longer than {_POSIX_PATH_MAX} bytes (see Minimum Values in POSIX.1 {8} 2.9.2), - contains any component longer than {_POSIX_NAME_MAX} bytes, or - contains any character in any component that is not in the portable filename character set (see 2.2.2.111). 4.47.4 Operands The following operand shall be supported by the implementation: _p_a_t_h_n_a_m_e A pathname to be checked. 4.47.5 External Influences 4.47.5.1 Standard Input None. 4.47.5.2 Input Files None. 4.47.5.3 Environment Variables The following environment variables shall affect the execution of pathchk: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.47 pathchk - Check pathnames 643 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.47.5.4 Asynchronous Events Default. 4.47.6 External Effects 4.47.6.1 Standard Output None. 4.47.6.2 Standard Error Used only for diagnostic messages. 4.47.6.3 Output Files None. 4.47.7 Extended Description None. 4.47.8 Exit Status The pathchk utility shall exit with one of the following values: 0 All _p_a_t_h_n_a_m_e operands passed all of the checks. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 644 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.47.9 Consequences of Errors Default. BEGIN_RATIONALE 4.47.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e To verify that all pathnames in an imported data interchange archive are legitimate and unambiguous on the current system: pax -f archive | xargs pathchk 1 if [ $? -eq 0 ] then pax -r -f archive else echo Investigate problems before importing files. exit 1 fi To verify that all files in the current directory hierarchy could be moved to any POSIX.1 {8} conforming system that also supports the pax utility: find . -print | xargs pathchk -p if [ $? -eq 0 ] then pax -w -f archive . else echo Portable archive cannot be created. exit 1 fi To verify that a user-supplied pathname names a readable file and that the application can create a file extending the given path without truncation and without overwriting any existing file: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.47 pathchk - Check pathnames 645 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX case $- in *C*) reset="";; *) reset="set +C" set -C;; esac test -r "$path" && pathchk "$path.out" && rm "$path.out" > "$path.out" if [ $? -ne 0 ]; then printf "%s: %s not found or %s.out fails \ 1 creation checks.\n" $0 "$path" "$path" 1 $reset # reset the noclobber option in case a trap 1 # on EXIT depends on it 1 exit 1 fi $reset PROCESSING < "$path" > "$path.out" The following assumptions are made in this example: (1) PROCESSING represents the code that will be used by the application to use $path once it is verified that $path.out will work as intended. (2) The state of the _n_o_c_l_o_b_b_e_r option is unknown when this code is invoked and should be set on exit to the state it was in when this code was invoked. (The reset variable is used in this example to restore the initial state.) (3) Note the usage of rm "$path.out" > "$path.out": (a) The pathchk command has already verified, at this point, that $path.out will not be truncated. (b) With the _n_o_c_l_o_b_b_e_r option set, the shell will verify that $path.out does not already exist before invoking rm. (c) If the shell succeeded in creating $path.out, rm will remove it so that the application can create the file again in the PROCESSING step. (d) If the PROCESSING step wants the file to already exist when it is invoked, the rm "$path.out" > "$path.out" should be replaced with > "$path.out" Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 646 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 which will verify that the file did not already exist, but leave $path.out in place for use by PROCESSING. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The pathchk utility is new, commissioned for this standard. It, along with the set -C (_n_o_c_l_o_b_b_e_r) option added to the shell, replaces the mktemp, validfnam, and create utilities that appeared in earlier drafts. All of these utilities were attempts to solve a few common problems: - Verify the validity (for several different definitions of ``valid'') of a pathname supplied by a user, generated by an application, or imported from an external source, - Atomically create a file, and - Perform various string handling functions to generate a temporary file name. The test utility (see 4.62) can be used to determine if a given pathname names an existing file; it will not, however, give any indication of whether or not any component of the pathname was truncated in a directory where the {_POSIX_NO_TRUNC} feature (see Execution-Time Symbolic Constants for Portability Specification in POSIX.1 {8} 2.9.4) is not in effect. The pathchk utility provided here does not check for file existence; it performs checks to determine if a pathname does exist or could be created with no pathname component truncation. The _n_o_c_l_o_b_b_e_r option added to the shell (see 3.14.11) can be used to atomically create a file. As with all file creation semantics in POSIX.1 {8}, it guarantees atomic creation, but still depends on applications to agree on conventions and cooperate on the use of files after they have been created. The create utility, included in one earlier draft, provided checking and atomic creation in a single invocation of the utility; these are orthogonal issues and need not be grouped into a single utility. Note that the _n_o_c_l_o_b_b_e_r option also provides a way of creating a lock for process synchronization; since it provides an atomic create, there is no race between a test for existence and the following creation if it did not exist. Having a function like _t_m_p_n_a_m() in the C Standard {7} is important in many high-level languages. The shell programming language, however, has built-in string manipulation facilities, making it very easy to construct temporary file names. The names needed obviously depend on the application, but are frequently of a form similar to $TMPDIR/_a_p_p_l_i_c_a_t_i_o_n__a_b_b_r_e_v_i_a_t_i_o_n$$._s_u_f_f_i_x Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.47 pathchk - Check pathnames 647 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX In cases where there is likely to be contention for a given suffix, a simple shell for or while loop can be used with the shell _n_o_c_l_o_b_b_e_r option to create a file without risk of collisions, as long as applications trying to use the same filename namespace are cooperating on the use of files after they have been created. END_RATIONALE 4.48 pax - Portable archive interchange 4.48.1 Synopsis pax [-cdnv] [-f _a_r_c_h_i_v_e] [-s _r_e_p_l_s_t_r] ... [_p_a_t_t_e_r_n ...] 1 pax -r [-cdiknuv] [-f _a_r_c_h_i_v_e] [-o _o_p_t_i_o_n_s] ... [-p _s_t_r_i_n_g] ... 1 [-s _r_e_p_l_s_t_r] ... [_p_a_t_t_e_r_n ...] 1 pax -w [-dituvX] [-b _b_l_o_c_k_s_i_z_e] [ [-a] [-f _a_r_c_h_i_v_e] ] [-o _o_p_t_i_o_n_s] ... 1 [-s _r_e_p_l_s_t_r] ... [-x _f_o_r_m_a_t] [_f_i_l_e ...] pax -r -w [-diklntuvX] [-p _s_t_r_i_n_g] ... [-s _r_e_p_l_s_t_r] ... [_f_i_l_e ...] _d_i_r_e_c_t_o_r_y 4.48.2 Description The pax utility shall read, write, and write lists of the members of archive files and copy directory hierarchies. A variety of archive formats shall be supported; see the -x _f_o_r_m_a_t option description under 4.48.3. The action to be taken depends on the presence of the -r and -w options: (1) When neither the -r option nor the -w option is specified, pax shall write the names of the members of the archive file read from the standard input, with pathnames matching the specified patterns, to standard output. If a named file is of type directory, the file hierarchy rooted at that file shall be written out as well. (2) When the -r option is specified, but the -w option is not, pax shall extract the members of the archive file read from the standard input, with pathnames matching the specified patterns. If an extracted file is of type directory, the file hierarchy rooted at that file shall be extracted as well. The extracted files shall be created relative to the current file hierarchy. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 648 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The ownership, access and modification times, and file mode of 1 the restored files are discussed under the -p option. 1 (3) When the -w option is specified and the -r option is not, pax shall write the contents of the file operands to the standard output in an archive format. If no _f_i_l_e operands are specified, a list of files to copy, one per line, shall be read from the standard input. A file of type directory shall include all of the files in the file hierarchy rooted at the file. (4) When both the -r and -w options are specified, pax shall copy the file operands to the destination directory. If no _f_i_l_e operands are specified, a list of files to copy, one per line, shall be read from the standard input. A file of type directory shall include all of the files in the file hierarchy rooted at the file. The effect of the copy shall be as if the copied files were written to an archive file and then subsequently extracted, except that there may be hard links between the original and the copied files. If the destination directory is a subdirectory of one of the files to be copied, the results are unspecified. If the destination directory is a file of a type not defined by POSIX.1 {8}, the results are implementation defined; otherwise it shall be an error for the file named by the directory operand not to exist, not be writable by the user, or not be a file of type directory. If, when the -r option is specified, intermediate directories are necessary to extract an archive member, pax shall perform actions equivalent to the POSIX.1 {8} _m_k_d_i_r() function, called with the following arguments: - The intermediate directory used as the _p_a_t_h argument. - The value of the bitwise inclusive OR of S_IRWXU, S_IRWXG, and S_IRWXO as the _m_o_d_e argument. If any specified _p_a_t_t_e_r_n or _f_i_l_e operands are not matched by at least one file or archive member, pax shall write a diagnostic message to standard error for each one that did not match and exit with a nonzero exit status. The supported archive formats shall be automatically detected on input. The default output archive format shall be implementation defined. A single archive can span multiple files. The pax utility shall determine, in an implementation-defined manner, what file to read or Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 649 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX write as the next file. If the selected archive format supports the specification of linked files, it shall be an error if these files cannot be linked when the archive is extracted. Any of the various names in the archive that 1 represent a file can be used to select the file for extraction. 1 4.48.3 Options The pax utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that the order of presentation of the -s options is significant. The following options shall be supported by the implementation: -r Read an archive file from standard input. -w Write files to the standard output in the specified archive format. -a Append files to the end of the archive. It is 1 implementation defined which devices on the system support 1 appending. Additional file formats unspecified by this 1 standard may impose restrictions on appending. 1 -b _b_l_o_c_k_s_i_z_e 1 Block the output at a positive decimal integer number of bytes per write to the archive file. Devices and archive formats may impose restrictions on blocking. Blocking shall be automatically determined on input. Conforming POSIX.2 applications shall not specify a _b_l_o_c_k_s_i_z_e value 1 larger than 32256. Default blocking when creating 1 archives depends on the archive format. (See the -x option below.) -c Match all file or archive members except those specified by the _p_a_t_t_e_r_n or _f_i_l_e operands. -d Cause files of type directory being copied or archived or archive members of type directory being extracted to match only the file or archive member itself and not the file hierarchy rooted at the file. -f _a_r_c_h_i_v_e Specify the pathname of the input or output archive, overriding the default standard input (when neither the -r option nor the -w option is specified, or the -r option is specified and the -w option is not) or standard output (when the -w option is specified and the -r option is Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 650 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 not). -i Interactively rename files or archive members. For each archive member matching a _p_a_t_t_e_r_n operand or file matching a _f_i_l_e operand, a prompt shall be written to the file /dev/tty. The prompt shall contain the name of the file or archive member, but the format is otherwise unspecified. A line shall then be read from /dev/tty. If 1 this line is blank, the file or archive member shall be 1 skipped. If this line consists of a single period, the file or archive member shall be processed with no modification to its name. Otherwise, its name shall be replaced with the contents of the line. The pax utility shall immediately exit with a nonzero exit status if end- of-file is encountered when reading a response or if /dev/tty cannot be opened for reading and writing. -k Prevent the overwriting of existing files. -l (The letter ell.) Link files. When both the -r and -w options are specified, hard links shall be made between the source and destination file hierarchies whenever possible. -n Select the first archive member that matches each _p_a_t_t_e_r_n operand. No more than one archive member shall be matched for each pattern (although members of type directory shall still match the file hierarchy rooted at that file). -o _o_p_t_i_o_n_s Provide information to the implementation to modify the 1 algorithm for extracting or writing files that is specific 1 to the file format specified by -x. This version of this 1 standard does not specify any such options and a Strictly 1 Conforming POSIX.2 Application shall not use the -o 1 option. 1 NOTE: It is expected that future versions of POSIX.2 will 1 offer additional file formats and this option will be used 1 by POSIX.2 and other POSIX standards to specify such 1 features as international file-name and file codeset 1 translations, security, accounting, etc., related to each 1 additional format. 1 -p _s_t_r_i_n_g Specify one or more file characteristic options (privileges). The _s_t_r_i_n_g option-argument shall be a string specifying file characteristics to be retained or discarded on extraction. The string shall consist of the specification characters a, e, m, o, and p, and/or other, implementation-defined, characters. Multiple Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 651 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX characteristics can be concatenated within the same string and multiple -p options can be specified. The meaning of the specification characters are as follows: a Do not preserve file access times. e Preserve the user ID, group ID, file mode bits 1 (see 2.2.2.60), access time, modification time, 1 and any other, implementation-defined, file 1 characteristics. 1 m Do not preserve file modification times. o Preserve the user ID and group ID. p Preserve the file mode bits. Other, 1 implementation-defined file-mode attributes may 1 be preserved. 1 In the preceding list, ``preserve'' indicates that an attribute stored in the archive shall be given to the extracted file, subject to the permissions of the invoking 1 process; otherwise, the attribute shall be determined as 1 part of the normal file creation action (see 2.9.1.4). 1 If neither the e nor the o specification character is specified, or the user ID and group ID are not preserved for any reason, pax shall not set the S_ISUID and S_ISGID bits of the file mode. If the preservation of any of these items fails for any reason, pax shall write a diagnostic message to standard error. Failure to preserve these items shall affect the final exit status, but shall not cause the extracted file to be deleted. If file-characteristic letters in any of the _s_t_r_i_n_g option-arguments are duplicated or conflict with each other, the one(s) given last shall take precedence. For example, if -p eme is specified, file modification times shall be preserved. -s _r_e_p_l_s_t_r Modify file or archive member names named by _p_a_t_t_e_r_n or _f_i_l_e operands according to the substitution expression _r_e_p_l_s_t_r, using the syntax of the ed utility (see 4.20). The concepts of ``address'' and ``line'' are meaningless in the context of the pax utility, and shall not be supplied. The format shall be: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 652 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -s /_o_l_d/_n_e_w/[gp] where as in ed, _o_l_d is a basic regular expression and _n_e_w can contain an ampersand, \_n (where _n is a digit) backreferences, or subexpression matching. The _o_l_d string shall also be permitted to contain characters. Any nonnull character can be used as a delimiter (/ shown here). Multiple -s expressions can be specified; the expressions shall be applied in the order specified, terminating with the first successful substitution. The optional trailing g shall be as defined in the ed utility. The optional trailing p shall cause successful substitutions to be written to standard error. File or archive member names that substitute to the empty string shall be ignored when reading and writing archives. -t Cause the access times of the archived files to be the same as they were before being read by pax. -u Ignore files that are older (having a less recent file modification time) than a pre-existing file or archive member with the same name. If the -r option is specified and the -w option is not specified, an archive member with the same name as a file in the file system shall be extracted if the archive member is newer than the file. If the -w option is specified and the -r option is not specified, an archive file member with the same name as a file in the file system shall be superseded if the file is newer than the archive member. It is unspecified if this is accomplished by actual replacement in the archive or by appending to the archive. If both the -r and -w options are specified, the file in the destination hierarchy shall be replaced by the file in the source hierarchy or by a link to the file in the source hierarchy if the file in the source hierarchy is newer. -v Produce a verbose table of contents (see 4.48.6.1) if neither the -r option nor the -w option is specified. Otherwise, list archive member pathnames to standard error (see 4.48.6.2). -x _f_o_r_m_a_t Specify the output archive format. The pax utility shall recognize the following formats: cpio The extended cpio interchange format specified in POSIX.1 {8} 10.1.2. The default _b_l_o_c_k_s_i_z_e 1 for this format for character special archive 1 files shall be 5120. Implementations shall 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 653 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX support all _b_l_o_c_k_s_i_z_e values less than or 1 equal to 32256 that are multiples of 512. ustar The extended tar interchange format specified in POSIX.1 {8} 10.1.1. The default _b_l_o_c_k_s_i_z_e 1 for this format for character special archive 1 files shall be 10240. Implementations shall 1 support all _b_l_o_c_k_s_i_z_e values less than or 1 equal to 32256 that are multiples of 512. Implementation-defined formats shall specify a default block size as well as any other block sizes supported for character special archive files. Any attempt to append to an archive file in a format different from the existing archive format shall cause pax to exit immediately with a nonzero exit status. -X When traversing the file hierarchy specified by a pathname, pax shall not descend into directories that have a different device ID [_s_t__d_e_v, see POSIX.1 {8} _s_t_a_t()]. The options that operate on the names of files or archive members (-c, 1 -i, -n, -s, -u, and -v) shall interact as follows. When the -r option is 1 specified and the -w option is not (archive members are being extracted), 1 the archive members shall be ``selected,'' based on the user-specified 1 _p_a_t_t_e_r_n operands as modified by the -c, -n, and -u options. Then, any -s and -i options shall modify, in that order, the names of the selected files. The -v option shall write names resulting from these modifications. When the -w option is specified (files are being archived), the files shall be selected based on the user-specified pathnames as modified by the -n and -u options. Then, any -s and -i options shall, in that order, modify the names of these selected files. The -v option shall write names resulting from these modifications. 1 If both the -u and -n options are specified, pax shall not consider a file selected unless it is newer than the file to which it is compared. 4.48.4 Operands The following operands shall be supported by the implementation: _d_i_r_e_c_t_o_r_y The destination directory pathname for copies when both the -r and -w options are specified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 654 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _f_i_l_e A pathname of a file to be copied or archived. _p_a_t_t_e_r_n A pattern matching one or more pathnames of archive members. A pattern shall be given in the name-generating notation of the pattern matching notation in 3.13, including the filename expansion rules in 3.13.3. The 1 default, if no _p_a_t_t_e_r_n is specified, is to select all 1 members in the archive. 4.48.5 External Influences 4.48.5.1 Standard Input If the -w option is specified, the standard input shall be used only if no _f_i_l_e operands are specified. It shall be a text file containing a list of pathnames, one per line, without leading or trailing s. If neither the -f nor -w options are specified, the standard input shall be an archive file. (See 4.48.5.2.) Otherwise, the standard input shall not be used. 4.48.5.2 Input Files The input file named by the _a_r_c_h_i_v_e option-argument, or standard input when the archive is read from there, shall be a file formatted according to one of the specifications in POSIX.1 {8} 10.1, or some other, implementation-defined, format. The file /dev/tty shall be used to write prompts and read responses. 4.48.5.3 Environment Variables The following environment variables shall affect the execution of pax: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 655 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements used in the pattern matching expressions for the _p_a_t_t_e_r_n operand, the basic regular expression for the -s option, and the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and the behavior of character classes within regular expressions and pattern matching. LC_MESSAGES This variable shall determine the processing of affirmative responses and the language in which messages should be written. LC_TIME This variable shall determine the format and contents of date and time strings when the -v option is specified. 4.48.5.4 Asynchronous Events Default. 4.48.6 External Effects 4.48.6.1 Standard Output If the -w option is specified and neither the -f nor -r options are specified, the standard output shall be the archive formatted according to one of the specifications in POSIX.1 {8} 10.1, or some other implementation-defined format. (See -x _f_o_r_m_a_t under 4.48.3.) If neither the -r option nor the -w option is specified, the table of contents of the selected archive members shall be written to standard output using the following format: 1 "%s\n", <_p_a_t_h_n_a_m_e> If neither the -r option nor the -w option is specified, but the -v option is specified, the table of contents of the selected archive members shall be written to standard output using the following formats: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 656 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 For pathnames representing hard links to previous members of the archive: "%sW==W%s\n", <_l_s -_l _l_i_s_t_i_n_g>, <_l_i_n_k_n_a_m_e> For all other pathnames: "%s\n", <_l_s -_l _l_i_s_t_i_n_g> where <_l_s -_l _l_i_s_t_i_n_g> shall be the format specified by the ls utility (see 4.39) with the -l option. When writing pathnames in this format, it is unspecified what is written for fields for which the underlying archive format does not have the correct information, although the correct number of -separated fields shall be written. When writing a table of contents of selected archive members, standard output shall not be buffered more than a line at a time. 4.48.6.2 Standard Error If either or both of the -r option and the -w option are specified as well as the -v option, pax shall write the pathnames it processes to the standard error output using the following format: 1 "%s\n", <_p_a_t_h_n_a_m_e> These pathnames shall be written as soon as processing is begun on the file or archive member, and shall be flushed to standard error. The trailing , which shall not be buffered, shall be written when the file has been read or written. If the -s option is specified, and the replacement string has a trailing p, substitutions shall be written to standard error in the following format: "%sW>>W%s\n", <_o_r_i_g_i_n_a_l _p_a_t_h_n_a_m_e>, <_n_e_w _p_a_t_h_n_a_m_e> 2 In all operating modes of pax (see 4.48.2), optional messages of unspecified format concerning the input archive format and volume number, the number of files, blocks, volumes, and media parts as well as other diagnostic messages may be written to standard error. In all formats, for both standard output and standard error, it is unspecified how nonprintable characters in pathnames or linknames are written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 657 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.48.6.3 Output Files If the -r option is specified, the extracted or copied output files shall be of the archived file type. If the -w option is specified, but the -r option is not, the output file named by the -f option argument shall be a file formatted according to one of the specifications in POSIX.1 {8} 10.1, or some other, implementation-defined, format. 4.48.7 Extended Description None. 4.48.8 Exit Status The pax utility shall exit with one of the following values: 0 All files were processed successfully. >0 An error occurred. 4.48.9 Consequences of Errors If pax cannot create a file or a link when reading an archive or cannot find a file when writing an archive, or cannot preserve the user ID, group ID, or file mode when the -p option is specified, a diagnostic message shall be written to standard error and a nonzero exit status shall be returned, but processing shall continue. In the case where pax cannot create a link to a file, pax shall not, by default, create a second copy of the file. If the extraction of a file from an archive is prematurely terminated by a signal or error, pax may have only partially extracted the file or (if the -n option was not specified) may have extracted a file of the same name as that specified by the user, but which is not the file the user wanted. Additionally, the file modes of extracted directories may have additional bits from the S_IRWXU mask set as well as incorrect modification and access times. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 658 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.48.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following command: pax -w -f /dev/rmt/1m . copies the contents of the current directory to tape drive 1, medium density (assuming historical System V device naming procedures. The historical BSD device name would be /dev/rmt9). The following commands: mkdir _n_e_w_d_i_r pax -rw _o_l_d_d_i_r _n_e_w_d_i_r copy the _o_l_d_d_i_r directory hierarchy to _n_e_w_d_i_r. pax -r -s ',^//*usr//*,,' -f a.pax reads the archive a.pax, with all files rooted in ``/usr'' in the archive extracted relative to the current directory. The -p (privileges) option was invented to reconcile differences between 1 historical tar and cpio implementations. In particular, the two 1 utilities used -m in diametrically opposed ways. The -p option also 1 provides a consistent means of extending the ways in which future file 1 attributes can be addressed, such as for enhanced security systems or 1 high-performance files. Although it may seem complex, there are really 1 two modes that will be most commonly used: 1 -p e ``Preserve everything.'' This would be used by the 1 historical super-user, someone with all the appropriate 1 privileges, to preserve all aspects of the files as they are 1 recorded in the archive. The e flag is the sum of o and p, 1 and other implementation-defined attributes. 1 -p p ``Preserve'' the file mode bits. This would be used by the 1 user with regular privileges who wished to preserve aspects 1 of the file other than the ownership. The file times are 1 preserved by default, but two other flags are offered to 1 disable these and use the time of extraction. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The description of pax was adopted from a command written by Glenn Fowler of AT&T. It is a new utility, commissioned for this standard. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 659 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The table of contents output is written to standard output to facilitate pipeline processing. The output archive formats required are those defined in POSIX.1 {8}; others, such as the historical tar format, may be added as an extension. The one pathname per line format of standard input precludes pathnames containing s. Although such pathnames violate the portable filename guidelines, they may exist and their presence may inhibit usage of pax within shell scripts. This problem is inherited from historical archive programs. The problem can be avoided by listing filename arguments on the command line instead of on standard input. An earlier draft had hard links displaying for all pathnames. This was 1 removed because it complicates the output of the non -v case and does not 1 match historical cpio usage. The hard-link information is available in 1 the -v display. 1 The working group realizes that the presence of symbolic links will affect certain pax operations. Historical practice, in both System V and BSD-based systems, is that the physical traversal of the file hierarchy shall be the default, and an option is provided to cause the utility to do a logical traversal, that is, follow symbolic links. Historical practice has not been so consistent as to what option is used to cause the logical traversal; BSD systems have used -h (cp and tar) and -L (ls), while the _S_V_I_D specifies -L (cpio and ls). Given this inconsistency, the -L option is recommended. The archive formats described in POSIX.1 {8} have certain restrictions that have been brought along from historical usage. For example, there are restrictions on the length of pathnames stored in the archive. When pax is used in -rw mode, copying directory hierarchies, there is no stated dependency on these archive formats. Therefore, such restrictions should not apply. The POSIX.2 working group is currently devising a new archive format to 1 be published in a revision or amendment to this standard. It is expected 1 that the ustar and cpio formats then will be retired from a future 1 version of POSIX.1 {8}. This new format will address all restrictions 1 and new requirements for security labeling, etc. The pax utility should be upward-compatible enough to handle any such changes. The reason that the default -x _f_o_r_m_a_t output format is implementation defined is to reserve the default format for this new standard interface. The -o 1 option was devised to provide means of controlling the many aspects of 1 international and security concerns without expending the entire alphabet 1 of option letters for this, and possibly other, file formats. The -o 1 string is meant to be specific for each -x format. Control of various 1 file permissions and attributes that can be expressed in a binary way 1 will continue to use the -p (permissions) option; the -o will be reserved 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 660 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 for more involved requirements and will probably take a 1 pax -o name=value,name=value -o name=value 1 approach. 1 The fundamental difference in how cpio and tar viewed the world was in the way directories were treated. The cpio utility did not treat directories differently from other files, and to select a directory and its contents required that each file in the hierarchy be explicitly specified. For tar, a directory matched every file in the file hierarchy it rooted. The pax utility offers both interfaces; by default, directories map into the file hierarchy they root. The -d option causes pax to skip any file not explicitly referenced, as cpio traditionally did. The tar-_s_t_y_l_e behavior was chosen as the default because it was believed that this was the more common usage, and because tar is the more commonly available interface, as it was historically provided on both System V and BSD implementations. Because a file may be matched more than once without causing it to be selected multiple times, the traditional usage of piping an ls or find to the archive command works as always. The Data Interchange Format specification of POSIX.1 {8} requires that processes with ``appropriate privileges'' shall always restore the ownership and permissions of extracted files exactly as archived. If viewed from the historic equivalence between super-user and ``appropriate privileges,'' there are two problems with this requirement. First, users running as super-users may unknowingly set dangerous permissions on extracted files. Second, it is needlessly limiting in that super-users cannot extract files and own them as super-user unless the archive was created by the super-user. (It should be noted that restoration of ownerships and permissions for the super-user, by default, is historical practice in cpio, but not in tar.) In order to avoid these two problems, the pax specification has an additional ``privilege'' mechanism, the -p option. Only a pax invocation with the POSIX.1 {8} privileges needed, and which has the -p option set using the e specification character, has the ``appropriate privilege'' to restore full ownership and permission information. Note also that POSIX.1 {8} 10.1 requires that the file ownership and access permissions shall be set, on extraction, in the same fashion as the POSIX.1 {8} _c_r_e_a_t() function when provided the mode stored in the archive. This means that the file creation mask of the user is applied to the file permissions. The default _b_l_o_c_k_s_i_z_e value of 5120 for cpio was selected because it is one of the standard block-size values for cpio, set when the -B option is specified. (The other default block-size value for cpio is 512, and this Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 661 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX was felt to be too small.) The default block value of 10240 for tar was selected as that is the standard block-size value for BSD tar. The 1 maximum block size of 32256 (215-512) is the largest multiple of 512 that 1 fits into a signed 16-bit tape controller transfer register. There are 1 known limitations in some historic system that would prevent larger 1 blocks from being accepted. Historic values were chosen to make 1 compatibility with existing scripts using dd or similar utilities to manipulate archives more likely. Also, default block sizes for any file type other than character special has been deleted from the standard as unimportant and not likely to affect the structure of the resulting archive. Implementations are permitted to modify the block-size value based on the archive format or the device to which the archive is being written. This is to provide implementations the opportunity to take advantage of special types of devices, and should not be used without a great deal of consideration as it will almost certainly decrease archive portability. The -n option in early drafts had three effects; the first was to cause special characters in patterns to not be treated specially. The second was to cause only the first file that matched a pattern to be extracted. The third was to cause pax to write a diagnostic message to standard error when no file was found matching a specified pattern. Only the second behavior is retained by POSIX.2, for many reasons. First, it is in general a bad idea for a single option to have multiple effects. Second, the ability to make pattern matching characters act as normal characters is useful for other parts of pax than just file extraction. Third, a finer degree of control over the special characters is useful, because users may wish to normalize only a single special character in a single file name. Fourth, given a more general escape mechanism, the previous behavior of the -n option can be easily obtained using the -s option or a sed script. Finally, writing a diagnostic message when a pattern specified by the user is unmatched by any file is useful behavior in all cases. There are two methods of copying subtrees in POSIX.2. The other method is described as part of the cp utility (see 4.13). Both methods are historical practice: cp provides a simpler, more intuitive interface, while pax offers a finer granularity of control. Each provides additional functionality to the other; in particular, pax maintains the hard-link structure of the hierarchy, while cp does not. It is the intention of the working group that the results be similar (using appropriate option combinations in both utilities). The results are not required to be identical; there seemed insufficient gain to applications to balance the difficulty of implementations having to guarantee that the results would be exactly identical. A single archive may span more than one file. See POSIX.1 {8} 10.1.3. While POSIX.1 {8} only refers to reading the archive file, it is Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 662 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 reasonable that the format utility may also determine, in an implementation-defined manner, the next file to write. It is suggested that implementations provide informative messages to the user on the standard error whenever the archive file is changed. The -d option (do not create intermediate directories not listed in the archive) found in previous drafts of this standard was originally provided as a complement to the historic -d option of cpio. It has been deleted. The -s option in earlier drafts specified a subset of the substitution command from the ed utility. As there was no reason for only a subset to be supported, the -s option is now compatible with the current ed specification. Since the delimiter can be any nonnull character, the following usage with single spaces is valid: pax -s " foo bar " ... The -t option (specify an implementation-defined identifier naming an input or output device) found in earlier drafts has been deleted because it is not historical practice and of limited utility. In particular, historic versions of neither cpio nor tar had the concept of devices that were not mapped into the file system; if the devices are mapped into the file system, the -f option is sufficient. The -o and -p options found in previous versions of this standard have been renamed to be -p and -t, respectively, to correspond more closely with the historic tar and cp utilities. The default behavior of pax with regard to file modification times is the same as historical implementations of tar. It is not the historical behavior of cpio. Because the -i option uses /dev/tty, utilities without a controlling terminal will not be able to use this option. The -y option, found in earlier drafts, has been deleted because a line containing a single period for the -i option has equivalent functionality. The special lines for the -i option (a single period and the empty line) are historical practice in cpio. In earlier drafts, an -e _c_h_a_r_m_a_p option was included to increase 1 portability of files between systems using different coded character 1 sets. This option was omitted because it was apparent that consensus 1 could not be formed for it. It was an interface without implementation 1 experience and overloaded the charmap file concept to provide additional 1 uses its original authors had not intended. The developers of POSIX.2 1 will consider other mechanisms for transporting files with nonportable 1 names as they develop the new interchange format, described earlier. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.48 pax - Portable archive interchange 663 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The -k option was added to address international concerns about the dangers involved in the character set transformations of -e (if the target character set were different than the source, the file names might be transformed into names matching existing files) and was made more general to also protect files transferred between file systems with different {NAME_MAX} values (truncating a filename on a smaller system might also inadvertently overwrite existing files). As stated, it prevents any overwriting, even if the target file is older than the source, which is seen as a generally useful feature anyway. It is almost certain that appropriate privileges will be required for pax to accomplish parts of this specification. Specifically, creating files of type block special or character special, restoring file access times unless the files are owned by the user (the -t option), or preserving file owner, group, and mode (the -p option) will all probably require appropriate privileges. Some of the file characteristics referenced in this specification may not be supported by some archive formats. For example, neither the tar nor cpio formats contain the file access time. For this reason, the e specification character has been provided, intended to cause all file characteristics specified in the archive to be retained. It is required that extracted directories, by default, have their access and modification times and permissions set to the values specified in the archive. This has obvious problems in that the directories are almost certainly modified after being extracted and that directory permissions may not permit file creation. One possible solution is to create directories with the mode specified in the archive, as modified by the _u_m_a_s_k of the user, plus sufficient permissions to allow file creation. After all files have been extracted, pax would then reset the access and modification times and permissions as necessary. When the -r option is specified, and the -w option is not, implementations are permitted to overwrite files when the archive has multiple members with the same name. This may fail, of course, if permissions on the first version of the file do not permit it to be overwritten. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 664 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.49 pr - Print files 4.49.1 Synopsis pr [+_p_a_g_e] [-_c_o_l_u_m_n] [-adFmrt] [-e[_c_h_a_r][_g_a_p]] [-h _h_e_a_d_e_r] [-i[_c_h_a_r][_g_a_p]] [-l _l_i_n_e_s] [-n[_c_h_a_r][_w_i_d_t_h]] [-o _o_f_f_s_e_t] [-s[_c_h_a_r]] [-w _w_i_d_t_h] [_f_i_l_e ...] 4.49.2 Description The pr utility is a printing and pagination filter. If multiple input files are specified, each shall be read, formatted, and written to standard output. By default, the input shall be separated into 66-line pages, each with: - A 5-line header that includes the page number, date, time, and the 1 pathname of the file. 1 - A 5-line trailer consisting of blank lines. 1 If standard output is associated with a terminal, diagnostic messages shall be deferred until the pr utility has completed processing. When options specifying multicolumn output are specified, output text columns shall be of equal width; input lines that do not fit into a text column shall be truncated. By default, text columns shall be separated with at least one . 4.49.3 Options The pr utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that: the _p_a_g_e option has a '+' delimiter; _p_a_g_e and _c_o_l_u_m_n can be multidigit numbers; some of the option-arguments are optional; and some of the option-arguments cannot be specified as separate arguments from the preceding option letter. In particular, the -s option does not allow the option letter to be separated from its argument, and the options -e, -i, and -n require that both arguments, if present, not be separated from the option letter. The following options shall be supported by the implementation. In the following option descriptions, _c_o_l_u_m_n, _l_i_n_e_s, _o_f_f_s_e_t, _p_a_g_e, and _w_i_d_t_h are 1 positive decimal integers; _g_a_p is a nonnegative decimal integer. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.49 pr - Print files 665 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX +_p_a_g_e Begin output at page number _p_a_g_e of the formatted input. -_c_o_l_u_m_n Produce output that is _c_o_l_u_m_n_s wide (default shall be 1) and is written down each column in the order in which the text is received from the input file. This option should not be used with -m. The options -e and -i shall be assumed for multiple text-column output. Whether or not text columns are balanced is unspecified, but a text column shall never exceed the length of the page (see the -l option). When used with -t, use the minimum number of lines to write the output. -a Modify the effect of the -_c_o_l_u_m_n option so that the 1 columns are filled across the page in a round-robin order 1 (e.g., when _c_o_l_u_m_n is 2, the first input line heads column 1 1, the second heads column 2, the third is the second line 1 in column 1, etc.). 1 -d Produce output that is double-spaced; append an extra following every found in the input. -e[_c_h_a_r][_g_a_p] Expand each input to the next greater column 1 position specified by the formula _n*_g_a_p+1, where _n is an 1 integer > 0. If _g_a_p is zero or is omitted, it shall 1 default to 8. All characters in the input shall be expanded into the appropriate number of s. If any nondigit character, _c_h_a_r, is specified, it shall be used as the input tab character. -F Use a character for new pages, instead of the default behavior that uses a sequence of characters. -h _h_e_a_d_e_r Use the string _h_e_a_d_e_r to replace the contents of the _f_i_l_e 1 operand in the page header. See 4.49.6.1. 1 -i[_c_h_a_r][_g_a_p] In output, replace multiple s with s wherever two or more adjacent s reach column positions _g_a_p+1, 2*_g_a_p+1, 3*_g_a_p+1, etc. If _g_a_p is zero or is omitted, default settings at every eighth column position shall be assumed. If any nondigit character, _c_h_a_r, is specified, it shall be used as the output character. -l _l_i_n_e_s Override the 66-line default and reset the page length to _l_i_n_e_s. If _l_i_n_e_s is not greater than the sum of both the 1 header and trailer depths (in lines), the pr utility shall Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 666 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 suppress both the header and trailer, as if the -t option were in effect. -m Merge files. Standard output shall be formatted so the pr utility writes one line from each file specified by a _f_i_l_e operand, side by side into text columns of equal fixed widths, in terms of the number of column positions. Implementations shall support merging of at least nine _f_i_l_e operands. -n[_c_h_a_r][_w_i_d_t_h] Provide _w_i_d_t_h-digit line numbering (default for _w_i_d_t_h shall be 5). The number shall occupy the first _w_i_d_t_h 1 column positions of each text column of default output or each line of -m output. If _c_h_a_r (any nondigit character) is given, it shall be appended to the line number to separate it from whatever follows (default for _c_h_a_r shall be a ). -o _o_f_f_s_e_t Each line of output shall be preceded by offset s. If the -o option is not specified, the default offset shall be zero. The space taken shall be in addition to the output line width (see -w option below). -r Write no diagnostic reports on failure to open files. -s[_c_h_a_r] Separate text columns by the single character _c_h_a_r instead of by the appropriate number of s (default for _c_h_a_r shall be the character). -t Write neither the five-line identifying header nor the five-line trailer usually supplied for each page. Quit writing after the last line of each file without spacing to the end of the page. -w _w_i_d_t_h Set the width of the line to _w_i_d_t_h column positions for multiple text-column output only. If the -w option is not specified and the -s option is not specified, the default width shall be 72. If the -w option is not specified and the -s option is specified, the default width shall be 512. For single column output, input lines shall not be truncated. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.49 pr - Print files 667 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.49.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of a file to be written. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. 4.49.5 External Influences 4.49.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. 4.49.5.2 Input Files The input files shall be text files. 4.49.5.3 Environment Variables The following environment variables shall affect the execution of pr: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and which characters are defined as printable (character class print). Nonprintable characters still shall be written to standard output, but shall be not counted for the purpose for column-width and line- length calculations. LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 668 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_TIME This variable shall determine the format of the date and time for use in writing header lines. TZ This variable shall determine the time zone for use in writing header lines. 4.49.5.4 Asynchronous Events If pr receives an interrupt while writing to a terminal, it shall flush all accumulated error messages to the screen before terminating. 4.49.6 External Effects 4.49.6.1 Standard Output The pr utility output shall be a paginated version of the original file (or files). This pagination shall be accomplished using either s or a sequence of s, as controlled by the -F option. Page headers shall be generated unless the -t option is specified. The page headers shall be of the form: "\n\n%s %s Page %d\n\n\n", <_o_u_t_p_u_t _o_f _d_a_t_e>, <_f_i_l_e>, <_p_a_g_e _n_u_m_b_e_r> In the POSIX Locale, the <_o_u_t_p_u_t _o_f _d_a_t_e> field, representing the date and time of last modification of the input file (or the current date and time if the input file is standard input), shall be equivalent to the output of the following command as it would appear if executed at the given time: date "+%b %e %H:%M %Y" without the trailing , if the page being written is from standard input. If the page being written is not from standard input, in the POSIX Locale, the same format shall be used, but the time used shall be the modification time of the file corresponding to _f_i_l_e instead of the current time. When the LC_TIME locale category is not set to the POSIX Locale, a different format and order of presentation of this field may be used. If the standard input is used instead of a _f_i_l_e operand, the <_f_i_l_e> field shall be replaced by a null string. If the -h option is specified, the _f_i_l_e field shall be replaced by the _h_e_a_d_e_r argument. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.49 pr - Print files 669 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.49.6.2 Standard Error Used only for diagnostic messages. 4.49.6.3 Output Files None. 4.49.7 Extended Description None. 4.49.8 Exit Status The pr utility shall exit with one of the following values: 0 All files were written successfully. >0 An error occurred. 4.49.9 Consequences of Errors Default. BEGIN_RATIONALE 4.49.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e To print a numbered list of all files in the current directory: ls -a | pr -n -h "Files in $(pwd)." _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This utility is one of those that does not follow the Utility Syntax Guidelines because of its historical origins. The working group could have added new options that obeyed the guidelines (and marked the old options _o_b_s_o_l_e_s_c_e_n_t) or devised an entirely new utility; there are examples of both actions in this standard. For this utility, it chose to leave some of the options as they are because of their heavy usage by 1 existing applications. However, due to interest in the international community, the developers of the standard have agreed to provide an alternative syntax for the next version of this standard that conforms to Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 670 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 the spirit of the Utility Syntax Guidelines. This new syntax will be accompanied by the existing syntax, marked as obsolescent. System implementors are encouraged to develop and promulgate a new syntax for pr, perhaps using a different utility name, that can be adopted for the next version of this standard. Implementations are required to accept option arguments to the -h, -l, -o, and -w options whether presented as part of the same argument or as a separate argument to pr, as suggested by the utility syntax guidelines. The -n and -s options, however, are specified as in historical practice because they are frequently specified without their optional arguments. If a were allowed before the option-argument in these cases, a file operand could mistakenly be interpreted as an option-argument in historical applications. Historical implementations of the pr utility have differed in the action taken for the -f option. BSD uses it as described here for the -F option; System V uses it to change trailing s on each page to a and, if standard output is a TTY device, sends an to standard error and reads a line from /dev/tty before the first page. Draft 9 incorrectly specified part of the System V behavior, raising several ballot objections. There were strong arguments from both sides of this issue concerning existing practice and additional arguments against the System V -f behavior, on the grounds that it was not a modular design to have the behavior of an option change depending on where output is directed. Therefore, the -f option is not specified and the -F option has been added. The -p option was omitted since it represents a purely interactive usage. 1 The <_o_u_t_p_u_t _o_f _d_a_t_e> field in the -l format is specified only for the POSIX Locale. As noted, the format can be different in other locales. No mechanism for defining this is present in this standard, as the appropriate vehicle is a messaging system; i.e., the format should be specified as a ``message.'' END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.49 pr - Print files 671 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.50 printf - Write formatted output 4.50.1 Synopsis printf _f_o_r_m_a_t [_a_r_g_u_m_e_n_t ...] 4.50.2 Description The printf utility shall write formatted operands to the standard output. The _a_r_g_u_m_e_n_t operands shall be formatted under control of the _f_o_r_m_a_t operand. 4.50.3 Options None. 4.50.4 Operands The following operands shall be supported by the implementation: _f_o_r_m_a_t A string describing the format to use to write the remaining operands; see 4.50.7. _a_r_g_u_m_e_n_t The strings to be written to standard output, under the control of _f_o_r_m_a_t; see 4.50.7. 4.50.5 External Influences 4.50.5.1 Standard Input None. 4.50.5.2 Input Files None. 4.50.5.3 Environment Variables The following environment variables shall affect the execution of printf: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 672 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_NUMERIC This variable shall determine the locale for numeric formatting. It shall affect the format of numbers written using the e, E, f, g, and G conversion characters (if supported). 4.50.5.4 Asynchronous Events Default. 4.50.6 External Effects 4.50.6.1 Standard Output See 4.50.7. 4.50.6.2 Standard Error Used only for diagnostic messages. 4.50.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.50 printf - Write formatted output 673 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.50.7 Extended Description The _f_o_r_m_a_t operand shall be used as the _f_o_r_m_a_t string described in 2.12 with the following exceptions: (1) A character in the format string, in any context other than a flag of a conversion specification, shall be treated as an ordinary character that is copied to the output. (2) A W character in the format string shall be treated as a W character, not as a . (3) In addition to the escape sequences shown in Table 2-15 (see 2.12), \_d_d_d, where _d_d_d is a one-, two-, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal number. (4) The implementation shall not precede or follow output from the d or u conversion specifications with s not specified by the _f_o_r_m_a_t operand. (5) The implementation shall not precede output from the o conversion specification with zeroes not specified by the _f_o_r_m_a_t operand. (6) The e, E, f, g, and G conversion specifications need not be supported. (7) An additional conversion character, b, shall be supported as follows. The argument shall be taken to be a string that may contain backslash-escape sequences. The following backslash- escape sequences shall be supported: (a) The escape sequences listed in Table 2-15, which shall be converted to the characters they represent; (b) \0_d_d_d, where _d_d_d is a zero-, one-, two-, or three-digit octal number that shall be converted to a byte with the numeric value specified by the octal number; (c) \c, which shall not be written and shall cause printf to ignore any remaining characters in the string operand containing it, any remaining string operands, and any additional characters in the _f_o_r_m_a_t operand. The interpretation of a backslash followed by any other sequence of characters is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 674 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Bytes from the converted string shall be written until the end of the string or the number of bytes indicated by the precision specification is reached. If the precision is omitted, it shall be taken to be infinite, so all bytes up to the end of the converted string shall be written. (8) For each specification that consumes an argument, the next argument operand shall be evaluated and converted to the appropriate type for the conversion as specified below. (9) The _f_o_r_m_a_t operand shall be reused as often as necessary to satisfy the argument operands. Any extra c or s conversion specifications shall be evaluated as if a null string argument were supplied; other extra conversion specifications shall be evaluated as if a zero argument were supplied. If the _f_o_r_m_a_t operand contains no conversion specifications and _a_r_g_u_m_e_n_t operands are present, the results are unspecified. (10) If a character sequence in the _f_o_r_m_a_t operand begins with a % character, but does not form a valid conversion specification, the behavior is unspecified. The _a_r_g_u_m_e_n_t operands shall be treated as strings if the corresponding conversion character is b, c, or s; otherwise, it shall be evaluated as a C constant, as described by the C Standard {7}, with the following extensions: - A leading plus or minus sign shall be allowed. - If the leading character is a single- or double-quote, the value shall be the numeric value in the underlying code set of the character following the single- or double-quote. If an argument operand cannot be completely converted into an internal value appropriate to the corresponding conversion specification, a diagnostic message shall be written to standard error and the utility shall not exit with a zero exit status, but shall continue processing any remaining operands and shall write the value accumulated at the time the error was detected to standard output. 4.50.8 Exit Status The printf utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.50 printf - Write formatted output 675 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.50.9 Consequences of Errors Default. BEGIN_RATIONALE 4.50.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e To alert the user and then print and read a series of prompts: printf "\aPlease fill in the following: \nName: " read name printf "Phone number: " read phone To read out a list of right and wrong answers from a file, calculate the percentage right, and print them out. The numbers are right-justified and separated by a single . The percentage is written to one decimal place of accuracy. while read right wrong ; do percent=$(echo "scale=1;($right*100)/($right+$wrong)" | bc) printf "%2d right\t%2d wrong\t(%s%%)\n" \ $right $wrong $percent done < database_file The command: printf "%5d%4d\n" 1 21 321 4321 54321 produces: 1 21 3214321 54321 0 Note that the _f_o_r_m_a_t operand is used three times to print all of the given strings and that a 0 was supplied by printf to satisfy the last %4d conversion specification. The printf utility is required to notify the user when conversion errors are detected while producing numeric output; thus, the following results would be expected on an implementation with 32-bit twos-complement integers when %d is specified as the _f_o_r_m_a_t operand: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 676 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Standard Argument Output Diagnostic Output ___________ ___________ _________________________________________ 5a 5 printf: "5a" not completely converted 9999999999 2147483647 printf: "9999999999" arithmetic overflow -9999999999 -2147483648 printf: "-9999999999" arithmetic overflow ABC 0 printf: "ABC" expected numeric value The diagnostic message format is not specified, but these examples convey the type of information that should be reported. Note that the value shown on standard output is what would be expected as the return value from the C Standard {7} function _s_t_r_t_o_l(). A similar correspondence exists between %u and _s_t_r_t_o_u_l() and %e, %f, and %g (if the implementation supports floating-point conversions) and _s_t_r_t_o_d(). In a locale using ISO/IEC 646 {1} as the underlying code set, the command: printf "%d\n" 3 +3 -3 \'3 \"+3 "'-3" produces: 3 Numeric value of constant 3 3 Numeric value of constant 3 -3 Numeric value of constant -3 51 Numeric value of the character ``3'' in ISO/IEC 646 {1} code set 43 Numeric value of the character ``+'' in ISO/IEC 646 {1} code set 45 Numeric value of the character ``-'' in ISO/IEC 646 {1} code set Note that in a locale with multibyte characters, the value of a character is intended to be the value of the equivalent of the _w_c_h_a_r__t representation of the character as described in C Standard {7}. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The printf utility was added to provide functionality that has historically been provided by echo. However, due to irreconcilable differences in the various versions of echo extant, the version in this standard has few special features, leaving those to this new printf utility, which is based on one in the Ninth Edition at AT&T Bell Labs. The Extended Description almost exactly matches the C Standard {7} _p_r_i_n_t_f() function, although it is described in terms of the file format notation in 2.12. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.50 printf - Write formatted output 677 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The floating point formatting conversion specifications are not required because all arithmetic in the shell is integer arithmetic. The awk utility performs floating point calculations and provides its own printf function. The bc utility can perform arbitrary-precision floating point arithmetic, but doesn't provide extensive formatting capabilities. (This printf utility cannot really be used to format bc output; it does not support arbitrary precision.) Implementations are encouraged to support the floating point conversions as an extension. Note that this printf utility, like the C Standard {7} _p_r_i_n_t_f() function on which it is based, makes no special provision for dealing with multibyte characters when using the %c conversion specification or when a precision is specified in a %b or %s conversion specification. Applications should be extremely cautious using either of these features when there are multibyte characters in the character set. Field widths and precisions cannot be specified as '*' since the '*' can be replaced directly in the _f_o_r_m_a_t operand using shell variable substitution. Implementations can also provide this feature as an extension if they so choose. Hexadecimal character constants as defined in the C Standard {7} are not recognized in the _f_o_r_m_a_t operand because there is no consistent way to detect the end of the constant. Octal character constants are limited to, at most, three octal digits, but hexadecimal character constants are only terminated by a nonhex-digit character. In the C Standard {7}, the ## concatenation operator can be used to terminate a constant and follow it with a hexadecimal character to be written. In the shell, concatenation occurs before the printf utility has a chance to parse the end of the hexadecimal constant. The %b conversion specification is not part of the C Standard {7}; it has been added here as a portable way to process backslash-escapes expanded in string operands as provided by the System V version of the echo utility. See also the rationale for echo for ways to use printf as a replacement for all of the traditional versions of the echo utility. If an argument cannot be parsed correctly for the corresponding conversion specification, the printf utility is required to report an error. Thus, overflow and extraneous characters at the end of an argument being used for a numeric conversion are to be reported as errors. If written in C, the printf utility could use the _s_t_r_t_o_l() function to parse optionally signed numeric arguments, _s_t_r_t_o_u_l() to parse unsigned numeric arguments, and _s_t_r_t_o_d() to parse floating point arguments (if floating point conversions are supported). It is not considered an error if an argument operand is not completely used for a c or s conversion or if a ``string'' operand's first or second character is used to get the numeric value of a character. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 678 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.51 pwd - Return working directory name 4.51.1 Synopsis pwd 4.51.2 Description The pwd utility shall write an absolute pathname of the current working directory to standard output. 4.51.3 Options None. 4.51.4 Operands None. 4.51.5 External Influences 4.51.5.1 Standard Input None. 4.51.5.2 Input Files None. 4.51.5.3 Environment Variables The following environment variables shall affect the execution of pwd: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.51 pwd - Return working directory name 679 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.51.5.4 Asynchronous Events Default. 4.51.6 External Effects 4.51.6.1 Standard Output The pwd utility output shall be an absolute pathname of the current working directory: "%s\n", <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e> 4.51.6.2 Standard Error Used only for diagnostic messages. 4.51.6.3 Output Files None. 4.51.7 Extended Description None. 4.51.8 Exit Status The pwd utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 680 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.51.9 Consequences of Errors If an error is detected, output shall not be written to standard output, a diagnostic message shall be written to standard error, and the exit status shall not be zero. BEGIN_RATIONALE 4.51.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Some implementations have historically provided pwd as a shell special built-in command. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e In most utilities, if an error occurs, partial output may be written to standard output. This does not happen in historical implementations of pwd. Because pwd is frequently used in existing shell scripts without checking the exit status, it is important that the historical behavior is required here; therefore, the Consequences of Errors subclause specifically disallows any partial output being written to standard output. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.51 pwd - Return working directory name 681 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.52 read - Read a line from standard input 4.52.1 Synopsis read [-r] _v_a_r ... 4.52.2 Description The read utility shall read a single line from standard input. By default, unless the -r option is specified, backslash (\) shall act as an escape character, as described in 3.2.1. The line shall be split into fields (see the definition in 3.1.3) as in the shell (see 3.6.5); the first field shall be assigned to the first variable _v_a_r, the second field to the second variable _v_a_r, etc. If there are fewer _v_a_r operands specified than there are fields, the leftover fields and their intervening separators shall be assigned to the last _v_a_r. If there are fewer fields than _v_a_rs, the remaining _v_a_rs shall be set to empty strings. The setting of variables specified by the _v_a_r operands shall affect the current shell execution environment; see 3.12. 4.52.3 Options The read utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -r Do not treat a backslash character in any special way. Consider each backslash to be part of the input line. 4.52.4 Operands The following operands shall be supported by the implementation: _v_a_r The name of an existing or nonexisting shell variable. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 682 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.52.5 External Influences 4.52.5.1 Standard Input The standard input shall be a text file. 4.52.5.2 Input Files None. 4.52.5.3 Environment Variables The following environment variables shall affect the execution of read: IFS This variable shall determine the internal field separators used to delimit fields. See 3.5.3. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. 2.6. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.52.5.4 Asynchronous Events Default. 4.52.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.52 read - Read a line from standard input 683 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.52.6.1 Standard Output None. 4.52.6.2 Standard Error Used only for diagnostic messages. 4.52.6.3 Output Files None. 4.52.7 Extended Description None. 4.52.8 Exit Status The read utility shall exit with one of the following values: 0 Successful completion. >0 End-of-file was detected or an error occurred. 4.52.9 Consequences of Errors Default. BEGIN_RATIONALE 4.52.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following command: while read -r xx yy do printf "%s %s\n" "$yy" "$xx" 1 done < _i_n_p_u_t__f_i_l_e prints a file with the first field of each line moved to the end of the line. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 684 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The text in 2.11.5.2 indicates that the results are undefined if an end- of-file is detected following a backslash at the end of a line when -r is not specified. Since read affects the current shell execution environment, it is generally provided as a shell regular built-in. If it is called in a 1 subshell or separate utility execution environment, such as one of the 1 following: 1 (read foo) 1 nohup read ... 1 find . -exec read ... \; 1 it will not affect the shell variables in the caller's environment. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The read utility has historically been a shell built-in. It was separated off into its own clause to take advantage of the standard's richer description of functionality at the utility level. The -r option was added to enable read to subsume the purpose of the historical line utility. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.52 read - Read a line from standard input 685 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.53 rm - Remove directory entries 4.53.1 Synopsis rm [-fiRr] _f_i_l_e ... 4.53.2 Description The rm utility shall remove the directory entry specified by each _f_i_l_e argument. If either of the files dot or dot-dot are specified as the basename portion of an operand (i.e., the final pathname component), rm shall write a diagnostic message to standard error and do nothing more with such operands. For each _f_i_l_e the following steps shall be taken: (1) If the _f_i_l_e does not exist: (a) If the -f option is not specified, write a diagnostic message to standard error. (b) Go on to any remaining _f_i_l_e_s. (2) If _f_i_l_e is of type directory, the following steps shall be taken: (a) If neither the -R option nor the -r option is specified, write a diagnostic message to standard error, do nothing more with _f_i_l_e, and go on to any remaining files. (b) If the -f option is not specified, and either the permissions of _f_i_l_e do not permit writing and the standard input is a terminal or the -i option is specified, write a prompt to standard error and read a line from the standard input. If the response is not affirmative, do nothing more with the current file and go on to any remaining files. (c) For each entry contained in _f_i_l_e, other than dot or dot- dot, the four steps listed here [(1)-(4)] shall be taken with the entry as if it were a _f_i_l_e operand. (d) If the -i option is specified, write a prompt to standard error and read a line from the standard input. If the response is not affirmative, do nothing more with the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 686 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 current file, and go on to any remaining files. (3) If _f_i_l_e is not of type directory, the -f option is not specified, and either the permissions of _f_i_l_e do not permit writing and the standard input is a terminal or the -i option is specified, write a prompt to the standard error and read a line from the standard input. If the response is not affirmative, do nothing more with the current file and go on to any remaining files. (4) If the current file is a directory, rm shall perform actions equivalent to the POSIX.1 {8} _r_m_d_i_r() function called with a pathname of the current file used as the _p_a_t_h argument. If the current file is not a directory, rm shall perform actions equivalent to the POSIX.1 {8} _u_n_l_i_n_k() function called with a pathname of the current file used as the _p_a_t_h argument. If this fails for any reason, rm shall write a diagnostic message to standard error, do nothing more with the current file, and go on to any remaining files. The rm utility shall be able to descend to arbitrary depths in a file hierarchy, and shall not fail due to path length limitations (unless an operand specified by the user exceeds system limitations). 4.53.3 Options The rm utility shall conform to the utility argument syntax guidelines 2 described in 2.10.2. 2 The following options shall be supported by the implementation: -f Do not prompt for confirmation. Do not write diagnostic messages or modify the exit status in the case of nonexistent operands. Any previous occurrences of the -i option shall be ignored. -i Prompt for confirmation as described in 4.53.2. Any previous occurrences of the -f option shall be ignored. -R Remove file hierarchies. See 4.53.2. -r Equivalent to -R. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.53 rm - Remove directory entries 687 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.53.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of a directory entry to be removed. 4.53.5 External Influences 4.53.5.1 Standard Input Used to read an input line in response to each prompt specified in 4.53.6.1. Otherwise, the standard input shall not be used. 4.53.5.2 Input Files None. 4.53.5.3 Environment Variables The following environment variables shall affect the execution of rm: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes within regular expressions used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 688 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_MESSAGES This variable shall determine the processing of affirmative responses and the language in which messages should be written. 4.53.5.4 Asynchronous Events Default. 4.53.6 External Effects 4.53.6.1 Standard Output None. 4.53.6.2 Standard Error Prompts shall be written to standard error under the conditions specified in 4.53.2 and 4.53.3. The prompts shall contain the _f_i_l_e pathname, but their format is otherwise unspecified. The standard error shall also be used for diagnostic messages. 4.53.6.3 Output Files None. 4.53.7 Extended Description None. 4.53.8 Exit Status The rm utility shall exit with one of the following values: 0 If the -f option was not specified, all the named directory entries were removed; otherwise, all the existing named directory entries were removed. >0 An error occurred. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.53 rm - Remove directory entries 689 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.53.9 Consequences of Errors Default. BEGIN_RATIONALE 4.53.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The _S_V_I_D requires that systems do not permit the removal of the last link to an executable binary file that is being executed. Thus, the rm utility can fail to remove such files. The -i option causes rm to prompt and read the standard input even if the standard input is not a terminal, but in the absence of -i the mode prompting is not done when the standard input is not a terminal. 1 For absolute clarity, paragraphs (2)(b) and (3) in 4.53.2, describing rm'_s behavior when prompting for confirmation, should be interpreted in the following manner: if ((NOT f_option) AND ((not_writable AND input_is_terminal) OR i_option)) It is forbidden to remove the names dot and dot-dot in order to avoid the consequences of inadvertently doing something like: rm -r .* The following command rm a.out core removes the directory entries a.out and core. The following command rm -Rf junk removes the directory junk and all its contents, without prompting. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The exact format of the interactive prompts is unspecified. Only the general nature of the contents of prompts are specified, because implementations may desire more descriptive prompts than those used on historical implementations. Therefore, an application not using the -f Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 690 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 option, or using the -i option relies on the system to provide the most suitable dialogue directly with the user, based on the behavior specified. The -r option is existing practice on all known systems. The synonym -R option is provided for consistency with the other utilities in this standard that provide options requesting recursive descent. The behavior of the -f option in historical versions of rm is inconsistent. In general, along with ``forcing'' the unlink without prompting for permission, it always causes diagnostic messages to be suppressed and the exit status to be unmodified for nonexistent operands and files that cannot be unlinked. In some versions, however, the -f option suppresses usage messages and system errors as well. Suppressing such messages is not a service to either shell scripts or users. It is less clear that error messages regarding unlinkable files should be suppressed. Although this is historical practice, this standard does not permit the -f option to suppress such messages. When given the -r and -i options, historical versions of rm prompt the user twice for each directory, once before removing its contents and once before actually attempting to delete the directory entry that names it. This allows the user to ``prune'' the file hierarchy walk. Historical versions of rm were inconsistent in that some did not do the former prompt for directories named on the command line and others had obscure prompting behavior when the -i option was specified and the permissions of the file did not permit writing. The POSIX.2 rm differs little from historic practice, but does require that prompts be consistent. Historical versions of rm were also inconsistent in that prompts were done to both standard output and standard error. POSIX.2 requires that prompts be done to standard error, for consistency with cp and mv and to allow existing extensions to rm that provide an option to list deleted files on standard output. The rm utility is required to descend to arbitrary depths so that any file hierarchy may be deleted. This means, for example, that the rm utility cannot run out of file descriptors during its descent, i.e., if the number of file descriptors is limited, rm cannot be implemented in the historical fashion where a file descriptor is used per directory level. Also, rm is not permitted to fail because of path length restrictions, unless an operand specified by the user is longer than {PATH_MAX}. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.53 rm - Remove directory entries 691 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.54 rmdir - Remove directories 4.54.1 Synopsis rmdir [-p] _d_i_r ... 4.54.2 Description The rmdir utility shall remove the directory entry specified by each _d_i_r operand, which shall refer to an empty directory. Directories shall be processed in the order specified. If a directory and a subdirectory of that directory are specified in a single invocation of the rmdir utility, the subdirectory shall be specified before the parent directory so that the parent directory will be empty when the rmdir utility tries to remove it. 4.54.3 Options The rmdir utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -p Remove all directories in a pathname. For each _d_i_r operand: (1) The directory entry it names shall be removed. (2) If the _d_i_r operand includes more than one pathname component, effects equivalent to the following command shall occur: rmdir -p $(dirname _d_i_r) 4.54.4 Operands The following operand shall be supported by the implementation: _d_i_r A pathname of an empty directory to be removed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 692 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.54.5 External Influences 4.54.5.1 Standard Input None. 4.54.5.2 Input Files None. 4.54.5.3 Environment Variables The following environment variables shall affect the execution of rmdir: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.54.5.4 Asynchronous Events Default. 4.54.6 External Effects 4.54.6.1 Standard Output None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.54 rmdir - Remove directories 693 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.54.6.2 Standard Error Used only for diagnostic messages. 4.54.6.3 Output Files None. 4.54.7 Extended Description None. 4.54.8 Exit Status The rmdir utility shall exit with one of the following values: 0 Each directory entry specified by a _d_i_r operand was removed successfully. >0 An error occurred. 4.54.9 Consequences of Errors Default. BEGIN_RATIONALE 4.54.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e On historical System V systems, the -p option also caused a message to be written to the standard output. The message indicated whether the whole path was removed or part of the path remains for some reason. The Standard Error subclause requires this diagnostic when the entire path specified by a _d_i_r operand is not removed, but does not allow the status message reporting success to be written as a diagnostic. If a directory a in the current directory is empty except it contains a directory b and a/b is empty except it contains a directory c, rmdir -p a/b/c will remove all three directories. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 694 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The rmdir utility on System V also included an -s option that suppressed the informational message output by the -p option. This option has been omitted because the informational message is not specified by POSIX.2. END_RATIONALE 4.55 sed - Stream editor 4.55.1 Synopsis sed [-n] _s_c_r_i_p_t [_f_i_l_e ...] sed [-n] [-e _s_c_r_i_p_t] ... [-f _s_c_r_i_p_t__f_i_l_e] ... [_f_i_l_e ...] 4.55.2 Description The sed utility is a stream editor that shall read one or more text files, make editing changes according to a script of editing commands, and write the results to standard output. The script shall be obtained from either the _s_c_r_i_p_t operand string or a combination of the option- arguments from the -e _s_c_r_i_p_t and -f _s_c_r_i_p_t__f_i_l_e options. 4.55.3 Options The sed utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that the order of presentation of the -e and -f options is significant. The following options shall be supported by the implementation: -e _s_c_r_i_p_t Add the editing commands specified by the _s_c_r_i_p_t option- argument to the end of the script of editing commands. The _s_c_r_i_p_t option-argument shall have the same properties as the _s_c_r_i_p_t operand, described in 4.55.4. -f _s_c_r_i_p_t__f_i_l_e Add the editing commands in the file _s_c_r_i_p_t__f_i_l_e to the end of the script. -n Suppress the default output (in which each line, after it is examined for editing, is written to standard output). Only lines explicitly selected for output shall be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 695 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Multiple -e and -f options may be specified. All commands shall be added to the script in the order specified, regardless of their origin. 4.55.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a file whose contents shall be read and edited. If multiple _f_i_l_e operands are specified, the named files shall be read in the order specified and the concatenation shall be edited. If no _f_i_l_e operands are specified, the standard input shall be used. _s_c_r_i_p_t A string to be used as the script of editing commands. The application shall not present a _s_c_r_i_p_t that violates the restrictions of a text file (see 2.2.2.151), except that the final character need not be a . 4.55.5 External Influences 4.55.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.55.5.2 Input Files The input files shall be text files. The _s_c_r_i_p_t__f_i_l_es named by the -f option shall consist of editing commands, one per line. 4.55.5.3 Environment Variables The following environment variables shall affect the execution of sed: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 696 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files), and the behavior of character classes within regular expressions. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.55.5.4 Asynchronous Events Default. 4.55.6 External Effects 4.55.6.1 Standard Output The input files shall be written to standard output, with the editing commands specified in the script applied. If the -n option is specified, only those input lines selected by the script shall be written to standard output. 4.55.6.2 Standard Error Used only for diagnostic messages. 4.55.6.3 Output Files The output files shall be text files whose formats are dependent on the editing commands given. 4.55.7 Extended Description The _s_c_r_i_p_t shall consist of editing commands, one per line, of the following form: [_a_d_d_r_e_s_s[,_a_d_d_r_e_s_s]]_c_o_m_m_a_n_d[_a_r_g_u_m_e_n_t_s] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 697 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Zero or more s shall be accepted before the first address and before _c_o_m_m_a_n_d. In default operation, sed cyclically shall copy a line of input, less its 1 terminating , into a _p_a_t_t_e_r_n _s_p_a_c_e (unless there is something 1 left after a D command), apply in sequence all commands whose addresses select that pattern space, and at the end of the script copy the pattern space to standard output (except when -n is specified) and delete the pattern space. Whenever the pattern space is written to standard output 1 or a named file, sed shall immediately follow it with a . 1 Some of the commands use a _h_o_l_d _s_p_a_c_e to save all or part of the _p_a_t_t_e_r_n _s_p_a_c_e for subsequent retrieval. The _p_a_t_t_e_r_n and _h_o_l_d _s_p_a_c_e_s shall each be able to hold at least 8192 bytes. _4._5_5._7._1 sed _A_d_d_r_e_s_s_e_s An address is either empty, a decimal number that counts input lines cumulatively across files, a $ character that addresses the last line of input, or a context address (which consists of a regular expression as described in 4.55.7.2, preceded and followed by a delimiter, usually a slash). A command line with no addresses shall select every pattern space. A command line with one address shall select each pattern space that matches the address. A command line with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second. (If the second address is a number less than or equal to the line number first selected, only one line shall be selected.) Starting at the first line following the selected range, sed shall look again for the first address. Thereafter the process shall be repeated. Editing commands can be applied only to nonselected pattern spaces by use of the negation command ! (see 4.55.7.3). _4._5_5._7._2 sed _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s The sed utility shall support the basic regular expressions described in 2.8.3, with the following additions: (1) In a context address, the construction \_c_R_E_c, where _c is any character other than or , shall be 1 identical to /_R_E/. If the character designated by _c appears following a backslash, then it shall be considered to be that Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 698 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 literal character, which shall not terminate the RE. For example, in the context address \xabc\xdefx, the second x stands for itself, so that the regular expression is abcxdef. (2) The escape sequence \n shall match a embedded in the pattern space. A literal character shall not be used in the regular expression of a context address or in the substitute command. 4.55.7.3 sed Editing Commands In the following list of commands, the maximum number of permissible addresses for each command is indicated by [_0_a_d_d_r], [_1_a_d_d_r], or [_2_a_d_d_r], representing zero, one, or two addresses. The argument _t_e_x_t shall consist of one or more lines. Each embedded in the text shall be preceded by a backslash. Other backslashes in text shall be removed and the following character shall be treated literally. The r and w commands take an optional _r_f_i_l_e (or _w_f_i_l_e) parameter, separated from the command letter by one or more s; implementations may allow zero separation as an extension. The argument _r_f_i_l_e or the argument _w_f_i_l_e shall terminate the command line. Each _w_f_i_l_e shall be created before processing begins. Implementations shall support at least nine _w_f_i_l_e arguments in the script; the actual number (_>9) that shall be supported by the implementation is unspecified. The use of the _w_f_i_l_e parameter shall cause that file to be initially created, if it does not exist, or shall replace the contents of an existing file. The b, r, s, t, w, y, !, and : commands shall accept additional arguments. The following synopses indicate which arguments shall be separated from the commands by a single . Two of the commands take a _c_o_m_m_a_n_d-_l_i_s_t, which is a list of sed commands separated by s, as follows: { _c_o_m_m_a_n_d _c_o_m_m_a_n_d ... } The { can be preceded with s and can be followed with white space. The _c_o_m_m_a_n_d_s can be preceded by white space. The terminating } shall be preceded by a and then zero or more s. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 699 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX [_2_a_d_d_r] {_c_o_m_m_a_n_d-_l_i_s_t } Execute _c_o_m_m_a_n_d-_l_i_s_t only when the pattern space is selected. [_1_a_d_d_r]a\ _t_e_x_t Write _t_e_x_t to standard output just before each attempt to 1 fetch a line of input, whether by executing the N command 1 or by beginning a new cycle. 1 [_2_a_d_d_r]b [_l_a_b_e_l] Branch to the : command bearing the _l_a_b_e_l. If _l_a_b_e_l is not specified, branch to the end of the script. The implementation shall support _l_a_b_e_l_s recognized as unique up to at least 8 characters; the actual length (_>8) that shall be supported by the implementation is unspecified. It is unspecified whether exceeding a label length causes an error or a silent truncation. [_2_a_d_d_r]c\ _t_e_x_t Delete the pattern space. With 0 or 1 address or at the end of a 2-address range, place _t_e_x_t on the output. [_2_a_d_d_r]d Delete the pattern space and start the next cycle. [_2_a_d_d_r]D Delete the initial segment of the pattern space through the first and start the next cycle. [_2_a_d_d_r]g Replace the contents of the pattern space by the contents of the hold space. [_2_a_d_d_r]G Append to the pattern space a followed by the 1 contents of the hold space. 1 [_2_a_d_d_r]h Replace the contents of the hold space with the contents of the pattern space. [_2_a_d_d_r]H Append to the hold space a followed by the 1 contents of the pattern space. 1 [_1_a_d_d_r]i\ _t_e_x_t Write _t_e_x_t to standard output. 1 [_2_a_d_d_r]l (The letter ell.) Write the pattern space to standard output in a visually unambiguous form. The characters 1 listed in Table 2-15 (see 2.12) shall be written as the 1 corresponding escape sequence. Nonprintable characters 1 not in Table 2-15 shall be written as one three-digit 1 octal number (with a preceding ) for each byte 1 in the character (most significant byte first). If the 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 700 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 size of a byte on the system is greater than nine bits, 1 the format used for nonprintable characters is 1 implementation defined. 1 Long lines shall be folded, with the point of folding 1 indicated by writing ; the length at 1 which folding occurs is unspecified, but should be 1 appropriate for the output device. The end of each line 1 shall be marked with a $. 1 [_2_a_d_d_r]n Write the pattern space to standard output if the default output has not been suppressed, and replace the pattern space with the next line of input. [_2_a_d_d_r]N Append the next line of input to the pattern space, using an embedded to separate the appended material from the original material. Note that the current line number changes. [_2_a_d_d_r]p Write the pattern space to standard output. [_2_a_d_d_r]P Write the pattern space, up to the first , to 1 standard output. [_1_a_d_d_r]q Branch to the end of the script and quit without starting a new cycle. [_1_a_d_d_r]r _r_f_i_l_e Copy the contents of _r_f_i_l_e to standard output just before 1 each attempt to fetch a line of input. If _r_f_i_l_e does not 1 exist or cannot be read, it shall be treated as if it were 1 an empty file, causing no error condition. 1 [_2_a_d_d_r]s/_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n/_r_e_p_l_a_c_e_m_e_n_t/_f_l_a_g_s Substitute the _r_e_p_l_a_c_e_m_e_n_t string for instances of the _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n in the pattern space. Any character other than or can be used instead of 1 a slash to delimit the RE and the replacement. Within the 1 RE and the replacement, the RE delimiter itself can be used as a literal character if it is preceded by a backslash. An ampersand (&) appearing in the _r_e_p_l_a_c_e_m_e_n_t shall be replaced by the string matching the RE. The special meaning of & in this context can be suppressed by preceding it by backslash. The characters \_n, where _n is a digit, shall be replaced by the text matched by the corresponding backreference expression (see 2.8.3.3). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 701 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX A line can be split by substituting a character into it. The application shall escape the in 1 the _r_e_p_l_a_c_e_m_e_n_t by preceding it by backslash. A 1 substitution shall be considered to have been performed even if the replacement string is identical to the string that it replaces. The value of _f_l_a_g_s shall be zero or more of: _n Substitute for the _nth occurrence only of the _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n found within the pattern space. g Globally substitute for all nonoverlapping instances of the _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n rather than just the first one. If both g and _n are specified, the results are unspecified. p Write the pattern space to standard output if a replacement was made. w _w_f_i_l_e Write. Append the pattern space to _w_f_i_l_e if a replacement was made. [_2_a_d_d_r]t [_l_a_b_e_l] Test. Branch to the : command bearing the _l_a_b_e_l if any substitutions have been made since the most recent reading of an input line or execution of a t. If _l_a_b_e_l is not specified, branch to the end of the script. [_2_a_d_d_r]w _w_f_i_l_e Append [write] the pattern space to _w_f_i_l_e. [_2_a_d_d_r]x Exchange the contents of the pattern and hold spaces. [_2_a_d_d_r]y/_s_t_r_i_n_g_1/_s_t_r_i_n_g_2/ Replace all occurrences of characters in _s_t_r_i_n_g_1 with the corresponding characters in _s_t_r_i_n_g_2. If the number of characters in _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 are not equal, or if any of the characters in _s_t_r_i_n_g_1 appear more than once, the results are undefined. Any character other than 1 or can be used instead of slash to 1 delimit the strings. Within _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2, the 1 delimiter itself can be used as a literal character if it 1 is preceded by a backslash. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 702 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 [_2_a_d_d_r]!_c_o_m_m_a_n_d [_2_a_d_d_r]!{_c_o_m_m_a_n_d-_l_i_s_t } Apply the _c_o_m_m_a_n_d or _c_o_m_m_a_n_d-_l_i_s_t only to the lines that are not selected by the address(es). [_0_a_d_d_r]:_l_a_b_e_l This command shall do nothing; it bears a _l_a_b_e_l for the b and t commands to branch to. [_1_a_d_d_r]= Write the following to standard output: "%d\n", <_c_u_r_r_e_n_t _l_i_n_e _n_u_m_b_e_r> 1 [_0_a_d_d_r] An empty command shall be ignored. [_0_a_d_d_r]# The # and the remainder of the line shall be ignored (treated as a comment), with the single exception that if the first two characters in the file are #n, the default output shall be suppressed; this shall be the equivalent of specifying -n on the command line. 4.55.8 Exit Status The sed utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.55.9 Consequences of Errors Default. BEGIN_RATIONALE 4.55.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e See the rationale for cat (4.4.10) for an example sed script. This standard requires implementations to support at least nine distinct _w_f_i_l_e_s, matching historical practice on many implementations. Implementations are encouraged to support more, but portable applications should not exceed this limit. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 703 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Note that regular expressions match entire strings, not just individual lines, but is matched by \n in a sed RE; is not allowed in an RE. Also note that \n cannot be used to match a at the end of an input line; s appear in the pattern space as a result of the N editing command. The exit status codes specified here are different from those in System V. System V returns 2 for garbled sed commands, but returns zero with its usage message or if the input file could not be opened. The working group considered this to be a bug. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The manner in which the l command writes nonprintable characters was changed to avoid the historical backspace-overstrike method and added 1 other requirements to achieve unambiguous output. See the rationale for 1 ed (4.20.10) for details of the format chosen, which is the same as that 1 chosen for sed. 1 The standard requires implementations to provide pattern and hold spaces of at least 8192 bytes, larger than the 4000-byte spaces used by some historical implementations, but less than the 20K byte limit used in an earlier draft. Implementations are encouraged to dynamically allocate larger pattern and hold spaces as needed. The requirements for acceptance of s and s in command lines has been made more explicit than in earlier drafts to clearly describe existing practice and remove confusion about the phrase ``protect initial blanks [sic] and tabs from the stripping that is done on every script line'' that appears in much of the historical documentation of the sed utility description of text. (Not all implementations are known to have 1 stripped s from text lines, although they all have allowed leading 1 s preceding the address on a command line.) 1 The treatment of # comments differs from the _S_V_I_D, which only allows a comment as the first line of the script, but matches BSD-derived implementations. The comment character is treated as a command and it has the same properties in terms of being accepted with leading _s; the BSD implementation has historically supported this. Earlier drafts of POSIX.2 required that a _s_c_r_i_p_t__f_i_l_e have at least one noncomment line. Some historical implementations have behaved in unexpected ways if this were not the case. The working group felt that this was incorrect behavior, and that application developers should not have to work around this feature. A correct implementation of POSIX.2 shall permit _s_c_r_i_p_t__f_i_l_es that consist only of comment lines. Earlier drafts indicated that if -e and -f options were intermixed, all -e options were processed before any -f options. This has been changed Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 704 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 to process them in the order presented because it matches existing practice and is more intuitive. The treatment of the p flag to the s command differs between System V and BSD-based systems (actually, between Version 7 and 32V) when the default output is suppressed. In the two examples: echo a | sed 's/a/A/p' echo a | sed -n 's/a/A/p' POSIX.2, BSD, System V documentation, and the _S_V_I_D indicate that the first example should write two lines with A, whereas the second should write one. Some System V systems write the A only once in both examples, because the p flag is ignored if the -n option is not specified. This is a case of a diametrical difference between systems that could not be reconciled through the compromise of declaring the behavior to be unspecified. The _S_V_I_D/BSD/32V behavior was adopted for POSIX.2 because: - No known documentation for any historic system describes the interaction between the p flag and the -n option. - The selected behavior is more correct as there is no technical justification for any interaction between the p flag and the -n option. A relationship between -n and the p flag might imply that they are only used together (when p should be a no-op), but this ignores valid scripts that interrupt the cyclical nature of the processing through the use of the D, d, q, or branching commands. Such scripts rely on the p suffix to write the pattern space because they do not make use of the default output at the ``bottom'' of the script. - Because the -n option makes the p flag a no-op, any interaction would only be useful if sed scripts were written to run both with and without the -n option. This is believed to be unlikely. It is even more unlikely that programmers have coded the p flag expecting it to be a no-op. Because the interaction was not documented, the likelihood of a programmer discovering the interaction and depending on it is further decreased. - Finally, scripts that break under the specified behavior will produce too much output instead of too little, which is easier to diagnose and correct. The form of the substitute command that uses the _n suffix was limited to the first 512 matches in a previous draft. This limit has been removed because there is no reason an editor processing lines of {LINE_MAX} length should have this restriction. The command s/a/A/2047 should be able to substitute the 2047th occurrence of a on a line. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.55 sed - Stream editor 705 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 4.56 sh - Shell, the standard command language interpreter 4.56.1 Synopsis sh [-aCefinuvx] [ _c_o_m_m_a_n_d__f_i_l_e [_a_r_g_u_m_e_n_t ...] ] 1 sh -c [-aCefinuvx] _c_o_m_m_a_n_d__s_t_r_i_n_g [ _c_o_m_m_a_n_d__n_a_m_e [_a_r_g_u_m_e_n_t ...] ] 1 sh -s [-aCefinuvx] [_a_r_g_u_m_e_n_t ...] 1 4.56.2 Description The sh utility is a command language interpreter that shall execute commands read from a command-line string, the standard input, or a specified file. The commands to be executed shall be expressed in the language described in Section 3. 4.56.3 Options The sh utility shall conform to the utility argument syntax guidelines described in 2.10.2. The -a, -C, -e, -f, -n, -u, -v, and -x options are described as part of the set utility in 3.14.11. The following additional options shall be supported by the implementation: -c Read commands from the _c_o_m_m_a_n_d__s_t_r_i_n_g operand. Set the value of special parameter 0 (see 3.5.2) from the value of the _c_o_m_m_a_n_d__n_a_m_e operand and the positional parameters ($1, $2, etc.) in sequence from the remaining _a_r_g_u_m_e_n_t operands. No commands shall be read from the standard input. -i Specify that the shell is _i_n_t_e_r_a_c_t_i_v_e; see below. An implementation may treat specifying the -i option as an error if the real user ID of the calling process does not equal the effective user ID or if the real group ID does not equal the effective user ID. -s Read commands from the standard input. If there are no operands and the -c option is not specified, the -s option shall be assumed. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 706 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 If the -i option is present, or if there are no operands and the shell's standard input and standard error are attached to a terminal, the shell is considered to be _i_n_t_e_r_a_c_t_i_v_e. (See 3.1.4.) The behavior of an interactive shell is not fully specified by this standard. NOTE: The preceding sentence is expected to change following the eventual approval of the UPE supplement. Implementations may accept the option letters with a leading plus sign (+) instead of a leading hyphen (meaning the reverse case of the option as described in this standard). A conforming application shall protect its first operand, if it starts with a plus sign, by preceding it with the -- argument that denotes ``end of options.'' 4.56.4 Operands The following operands shall be supported by the implementation: - A single hyphen shall be treated as the first operand and then ignored. If both - and -- are given as arguments, or if other operands precede the single hyphen, the results are undefined. _a_r_g_u_m_e_n_t The positional parameters ($1, $2, etc.) shall be set to _a_r_g_u_m_e_n_t_s, if any. _c_o_m_m_a_n_d__f_i_l_e The pathname of a file containing commands. If the 1 pathname contains one or more slash characters, the 1 implementation shall attempt to read that file; the file 1 need not be executable. If the pathname does not contain 1 a slash character: - The implementation shall attempt to read that file from the current working directory; the file need not be executable. - If the file is not in the current working directory, the implementation may perform a search for an executable file using the value of PATH, as described in 3.9.1.1. Special parameter 0 (see 3.5.2) shall be set to the value of _c_o_m_m_a_n_d__f_i_l_e. If sh is called using a synopsis form that omits _c_o_m_m_a_n_d__f_i_l_e, special parameter 0 shall be set to the value of the first argument passed to sh from its parent (e.g., _a_r_g_v[0] in the C binding), which is normally a pathname used to execute the sh utility. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.56 sh - Shell, the standard command language interpreter 707 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _c_o_m_m_a_n_d__n_a_m_e A string assigned to special parameter 0 when executing the commands in _c_o_m_m_a_n_d__s_t_r_i_n_g. If _c_o_m_m_a_n_d__n_a_m_e is not specified, special parameter 0 shall be set to the value of the first argument passed to sh from its parent (e.g., _a_r_g_v[0] in the C binding), which is normally a pathname used to execute the sh utility. _c_o_m_m_a_n_d__s_t_r_i_n_g A string that shall be interpreted by the shell as one or more commands, as if the string were the argument to the function in 7.1.1 [such as the _s_y_s_t_e_m() function in the C binding]. If the _c_o_m_m_a_n_d__s_t_r_i_n_g operand is an empty 1 string, sh shall exit with a zero exit status. 1 4.56.5 External Influences 4.56.5.1 Standard Input The standard input shall be used only if: (1) The -s option is specified, or; (2) The -c option is not specified and no operands are specified, or; (3) The script executes one or more commands that require input from standard input (such as a read command that does not redirect its input). See Input Files. When the shell is using standard input and it invokes a command that also uses standard input, the shell shall ensure that the standard input file pointer points directly after the command it has read when the command begins execution. It shall not read ahead in such a manner that any 1 characters intended to be read by the invoked command are consumed by the 1 shell (whether interpreted by the shell or not) or that characters that 1 are not read by the invoked command are not seen by the shell. When the 1 command expecting to read standard input is started asynchronously by an interactive shell, it is unspecified whether characters are read by the command or interpreted by the shell. If the standard input to sh is a FIFO or terminal device and is set to 1 nonblocking reads, then sh shall enable blocking reads on standard input. 1 This shall remain in effect when the command completes. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 708 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.56.5.2 Input Files The input file shall be a text file, except that line lengths shall be 1 unlimited. If the input file is empty or consists solely of blank lines 1 and/or comments, sh shall exit with a zero exit status. 1 4.56.5.3 Environment Variables The following environment variables shall affect the execution of sh: HOME This variable shall be interpreted as the pathname of the user's home directory. The contents of HOME are used in Tilde Expansion as described in 3.6.1. IFS _I_n_p_u_t _f_i_e_l_d _s_e_p_a_r_a_t_o_r_s: a string treated as a list of characters that shall be used for field splitting and to split lines into words with the read command. See 3.6.5. If IFS is not set, the shell shall behave as if the value of IFS were the , , and characters. Implementations may ignore the value of IFS in the environment at the time sh is invoked, treating IFS as if it were not set. LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the behavior of range expressions, equivalence classes, and multicharacter collating elements within pattern matching. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files), which characters are defined as letters (character class alpha), and the behavior of character classes within pattern matching. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.56 sh - Shell, the standard command language interpreter 709 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_MESSAGES This variable shall determine the language in which messages should be written. PATH This variable shall represent a string formatted as described in 2.6, used to effect command interpretation. See 3.9.1.1. 4.56.5.4 Asynchronous Events Default. 4.56.6 External Effects 4.56.6.1 Standard Output See Standard Error. 4.56.6.2 Standard Error Except as otherwise stated (by the descriptions of any invoked utilities or in interactive mode), standard error is used only for diagnostic messages. 4.56.6.3 Output Files None. 4.56.7 Extended Description See Section 3. 4.56.8 Exit Status The sh utility shall exit with one of the following values: 1 0 The script to be executed consisted solely of zero or more 1 blank lines and/or comments. 1 1-125 A noninteractive shell detected a syntax, redirection, or 1 variable assignment error. 1 127 A specified _c_o_m_m_a_n_d__f_i_l_e could not be found by a 1 noninteractive shell. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 710 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Otherwise, the shell shall return the exit status of the last command it invoked or attempted to invoke (see also the exit utility in 3.14.7). 4.56.9 Consequences of Errors See 3.8.1. BEGIN_RATIONALE 4.56.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e sh -c "cat myfile" sh my_shell_cmds The sh utility and the set special built-in utility share a common set of options. Unlike set, however, the POSIX.2 sh does not specify the use of + as an option flag, because it is not particularly useful (the + variety generally invokes the default behavior) and because _g_e_t_o_p_t() does not support it. However, since many historical implementations do support the plus, applications will have to guard against the relatively obscure case of a first operand with a leading plus sign. There is a large number of environment variables used by historical implementations of sh that will not be introduced by POSIX.2 until the UPE is completed. The KornShell ignores the contents of IFS upon entry to the script. A conforming application cannot rely on importing IFS. One justification for this, beyond security considerations, is to assist possible future shell compilers. Allowing IFS to be imported from the environment will prevent many optimizations that might otherwise be performed via dataflow analysis of the script itself. The standard input and standard error are the files that determine whether a shell is interactive when -i is not specified. For example, sh > file and sh 2> file create interactive and noninteractive shells, respectively. Although both accept terminal input, the results of error conditions will be different, as described in 3.8.1; in the second example a redirection error encountered by a special built-in utility will abort the shell. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.56 sh - Shell, the standard command language interpreter 711 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The text in Standard Input about nonblocking reads concerns an instance 1 of sh that has been invoked, probably by a C-language program, with 1 standard input that has been opened using the O_NONBLOCK flag; see 1 POSIX.1 {8} _o_p_e_n(). If the shell did not reset this flag, it would 1 immediately terminate because no input data would be available yet and 1 that would be considered the same as end-of-file. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e See the Rationale for Section 3 concerning the lack of interactive features in sh. These features, including optional job control, are scheduled to be added in the User Portability Extension. The PS1 and PS2 variables are not specified because this standard, without UPE, does not describe an interactive shell. The options associated with a _r_e_s_t_r_i_c_t_e_d _s_h_e_l_l (command name rsh and the -r option) were excluded because the developers of the standard felt that the implied level of security was not achievable and they did not want to raise false expectations. On systems that support set-user-ID scripts, a historical trapdoor has been to link a script to the name -i. When it is called by a sequence such as sh - or by #! /bin/sh - the historical systems have assumed that no option letters follow. Thus, POSIX.2 allows the single hyphen to mark the end of the options, in addition to the use of the regular -- argument, because it was felt that the older practice was so pervasive. An alternative approach is taken by the KornShell, where real and effective user/group IDs must match for an interactive shell; this behavior is specifically allowed by POSIX.2. (Note: there are other problems with set-user-ID scripts that the two approaches described here do not deal with.) END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 712 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.57 sleep - Suspend execution for an interval 4.57.1 Synopsis sleep _t_i_m_e 4.57.2 Description The sleep utility shall suspend execution for at least the integral number of seconds specified by the _t_i_m_e operand. 4.57.3 Options None. 4.57.4 Operands The following operands shall be supported by the implementation: _t_i_m_e A nonnegative decimal integer specifying the number of seconds for which to suspend execution. 4.57.5 External Influences 4.57.5.1 Standard Input None. 4.57.5.2 Input Files None. 4.57.5.3 Environment Variables The following environment variables shall affect the execution of sleep: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.57 sleep - Suspend execution for an interval 713 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.57.5.4 Asynchronous Events If the sleep utility receives a SIGALRM signal, one of the following actions shall be taken: (1) Terminate normally with a zero exit status (2) Effectively ignore the signal (3) Provide the default behavior for signals described in 2.11.5.4. This could include terminating with a nonzero exit status. The sleep utility shall take the standard action for all other signals; see 2.11.5.4. 4.57.6 External Effects 4.57.6.1 Standard Output None. 4.57.6.2 Standard Error Used only for diagnostic messages. 4.57.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 714 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.57.7 Extended Description None. 4.57.8 Exit Status The sleep utility shall exit with one of the following values: 0 The execution was successfully suspended for at least _t_i_m_e seconds, or a SIGALRM signal was received (see 4.57.5.4). >0 An error occurred. 4.57.9 Consequences of Errors Default. BEGIN_RATIONALE 4.57.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The exit status is allowed to be zero when sleep is interrupted by the SIGALRM signal, because most implementations of this utility rely on the arrival of that signal to notify them that the requested finishing time has been successfully attained. Such implementations thus do not distinguish this situation from the successful completion case. Other implementations are allowed to catch the signal and go back to sleep until the requested time expires or provide the normal signal termination procedures. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e As with all other utilities that take integral operands and do not specify subranges of allowed values, sleep is required by this standard to deal with _t_i_m_e requests of up to 2147483647 seconds. This may mean that some implementations will have to make multiple calls to the underlying operating system's delay mechanism if its argument range is less than this. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.57 sleep - Suspend execution for an interval 715 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.58 sort - Sort, merge, or sequence check text files 4.58.1 Synopsis sort [-m] [-o _o_u_t_p_u_t] [-bdfinru] [-t _c_h_a_r] [-k _k_e_y_d_e_f] ... [_f_i_l_e ...] sort -c [-bdfinru] [-t _c_h_a_r] [-k _k_e_y_d_e_f] ... [_f_i_l_e] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n_s: sort [-mu] [-o _o_u_t_p_u_t] [-bdfinr] [-t _c_h_a_r] [+_p_o_s_1[-_p_o_s_2]] ... [_f_i_l_e ...] sort -c [-u] [-bdfinr] [-t _c_h_a_r] [+_p_o_s_1[-_p_o_s_2]] ... [_f_i_l_e] 4.58.2 Description The sort utility shall perform one of the following functions: (1) Sort lines of all the named files together and write the result to the specified output. (2) Merge lines of all the named (presorted) files together and write the result to the specified output. (3) Check that a single input file is correctly presorted. Comparisons shall be based on one or more sort keys extracted from each line of input (or the entire line if no sort keys are specified), and shall be performed using the collating sequence of the current locale. 4.58.3 Options The sort utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that the notation +_p_o_s_1 -_p_o_s_2 uses a nonstandard prefix and multidigit option names in the obsolescent versions, the -o _o_u_t_p_u_t option shall be recognized after a _f_i_l_e operand as an obsolescent feature in both versions where the -c option is not specified, and the -k _k_e_y_d_e_f option should follow the -b, -d, -f, -i, -n, and -r options. The following options shall be supported by the implementation: -c Check that the single input file is ordered as specified by the arguments and the collating sequence of the current locale. No output shall be produced; only the exit code shall be affected. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 716 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -m Merge only; the input files shall be assumed to be already sorted. -o _o_u_t_p_u_t Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input _f_i_l_es. -u Unique: suppress all but one in each set of lines having equal keys. If used with the -c option, check that there are no lines with duplicate keys, in addition to checking that the input file is sorted. The following options shall override the default ordering rules. When ordering options appear independent of any key field specifications, the requested field ordering rules shall be applied globally to all sort keys. When attached to a specific key (see -k), the specified ordering options shall override all global ordering options for that key. In the obsolescent forms, if one or more of these options follows a +_p_o_s_1 option, it shall affect only the key field specified by that preceding option. -d Specify that only s and alphanumeric characters, according to the current setting of LC_CTYPE, shall be significant in comparisons. The behavior is undefined for a sort key to which -i or -n also applies. -f Consider all lowercase characters that have uppercase equivalents, according to the current setting of LC_CTYPE, to be the uppercase equivalent for the purposes of comparison. -i Ignore all characters that are nonprintable, according to the current setting of LC_CTYPE. -n Restrict the sort key to an initial numeric string, consisting of optional s, optional minus sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which shall be sorted by arithmetic value. An empty digit string shall be treated as zero. Leading zeros and signs on zeros shall not affect ordering. -r Reverse the sense of comparisons. The treatment of field separators can be altered using the options: -b Ignore leading s when determining the starting and ending positions of a restricted sort key. If the -b option is specified before the first -k option, it shall Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.58 sort - Sort, merge, or sequence check text files 717 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX be applied to all -k options. Otherwise, the -b option can be attached independently to each -k _f_i_e_l_d__s_t_a_r_t or _f_i_e_l_d__e_n_d option-argument (see below). -t _c_h_a_r Use _c_h_a_r as the field separator character; _c_h_a_r shall not be considered to be part of a field (although it can be included in a sort key). Each occurrence of _c_h_a_r shall be significant (for example, <_c_h_a_r><_c_h_a_r> shall delimit an empty field). If -t is not specified, characters shall be used as default field separators; each maximal nonempty sequence of characters that follows a non- character shall be a field separator. Sort keys can be specified using the options: -k _k_e_y_d_e_f The _k_e_y_d_e_f argument is a restricted sort key field definition. The format of this definition is _f_i_e_l_d__s_t_a_r_t[_t_y_p_e][,_f_i_e_l_d__e_n_d[_t_y_p_e]] where _f_i_e_l_d__s_t_a_r_t and _f_i_e_l_d__e_n_d define a key field restricted to a portion of the line (see 4.58.7), and _t_y_p_e is a modifier from the list of characters b, d, f, i, n, r. The b modifier shall behave like the -b option, but applies only to the _f_i_e_l_d__s_t_a_r_t or _f_i_e_l_d__e_n_d to which it is attached. The other modifiers shall behave like the corresponding options, but shall apply only to the key field to which they are attached; they shall have this effect if specified with _f_i_e_l_d__s_t_a_r_t, _f_i_e_l_d__e_n_d, or both. Modifiers attached to a _f_i_e_l_d__s_t_a_r_t or _f_i_e_l_d__e_n_d shall override any specifications made by the options. Implementations shall support at least nine occurrences of the -k option, which shall be significant in command line order. If no -k option is specified, a default sort key of the entire line shall be used. When there are multiple key fields, later keys shall be compared only after all earlier keys compare equal. Except when the -u option is specified, lines that otherwise compare equal shall be ordered as if none of the options -d, -f, -i, -n, or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison. The order in which lines that still compare equal are written is unspecified. +_p_o_s_1 (Obsolescent.) Specify the start position of a key field. See 4.58.7. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 718 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -_p_o_s_2 (Obsolescent.) Specify the end position of a key field. See 4.58.7. 4.58.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of a file to be sorted, merged, or checked. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. 4.58.5 External Influences 4.58.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. 4.58.5.2 Input Files The input files shall be text files, except that the sort utility shall add a to the end of a file ending with an incomplete last line. 4.58.5.3 Environment Variables The following environment variables shall affect the execution of sort: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for ordering rules. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and the behavior of character classification for the -b, -d, -f, -i, and -n options. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.58 sort - Sort, merge, or sequence check text files 719 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_MESSAGES This variable shall determine the language in which messages should be written. LC_NUMERIC This variable shall determine the locale for the definition of the radix character and thousands separator for the -n option. 4.58.5.4 Asynchronous Events Default. 4.58.6 External Effects 4.58.6.1 Standard Output Unless the -o or -c options are in effect, the standard output shall contain the sorted input. 4.58.6.2 Standard Error Used only for diagnostic messages. A warning message about correcting an 2 incomplete last line of an input file may be generated, but need not 2 affect the final exit status. 2 4.58.6.3 Output Files If the -o option is in effect, the sorted input shall be placed in the file _o_u_t_p_u_t. 4.58.7 Extended Description The notation -k _f_i_e_l_d__s_t_a_r_t[_t_y_p_e][,_f_i_e_l_d__e_n_d[_t_y_p_e]] shall define a key field that begins at _f_i_e_l_d__s_t_a_r_t and ends at _f_i_e_l_d__e_n_d inclusive, unless _f_i_e_l_d__s_t_a_r_t falls beyond the end of the line or after _f_i_e_l_d__e_n_d, in which case the key field shall be empty. A missing _f_i_e_l_d__e_n_d shall mean the last character of the line. A field comprises a maximal sequence of nonseparating characters and, in 1 the absence of option -t, any preceding field separator. 1 The _f_i_e_l_d__s_t_a_r_t portion of the _k_e_y_d_e_f option argument shall have the form: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 720 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _f_i_e_l_d__n_u_m_b_e_r[._f_i_r_s_t__c_h_a_r_a_c_t_e_r] Fields and characters within fields shall be numbered starting with 1. The _f_i_e_l_d__n_u_m_b_e_r and _f_i_r_s_t__c_h_a_r_a_c_t_e_r pieces, interpreted as positive decimal integers, shall specify the first character to be used as part of a sort key. If ._f_i_r_s_t__c_h_a_r_a_c_t_e_r is omitted, it shall refer to the first character of the field. The _f_i_e_l_d__e_n_d portion of the _k_e_y_d_e_f option argument shall have the form: _f_i_e_l_d__n_u_m_b_e_r[._l_a_s_t__c_h_a_r_a_c_t_e_r] The _f_i_e_l_d__n_u_m_b_e_r shall be as described above for _f_i_e_l_d__s_t_a_r_t. The _l_a_s_t__c_h_a_r_a_c_t_e_r piece, interpreted as a nonnegative decimal integer, shall specify the last character to be used as part of the sort key. If _l_a_s_t__c_h_a_r_a_c_t_e_r evaluates to zero or ._l_a_s_t__c_h_a_r_a_c_t_e_r is omitted, it shall refer to the last character of the field specified by _f_i_e_l_d__n_u_m_b_e_r. If the -b option or b type modifier is in effect, characters within a field shall be counted from the first non- in the field. (This shall apply separately to _f_i_r_s_t__c_h_a_r_a_c_t_e_r and _l_a_s_t__c_h_a_r_a_c_t_e_r.) The obsolescent [ +_p_o_s_1 [-_p_o_s_2] ] options provide functionality equivalent to the -k _k_e_y_d_e_f option. For comparison, the full formats of these options shall be: +_f_i_e_l_d_0__n_u_m_b_e_r[._f_i_r_s_t_0__c_h_a_r_a_c_t_e_r][_t_y_p_e] [-_f_i_e_l_d_0__n_u_m_b_e_r[._f_i_r_s_t_0__c_h_a_r_a_c_t_e_r][_t_y_p_e]] -k _f_i_e_l_d__n_u_m_b_e_r[._f_i_r_s_t__c_h_a_r_a_c_t_e_r][_t_y_p_e][,_f_i_e_l_d__n_u_m_b_e_r[._l_a_s_t__c_h_a_r_a_c_t_e_r][_t_y_p_e]] In the obsolescent form, fields (specified by _f_i_e_l_d_0__n_u_m_b_e_r) and characters within fields (specified by _f_i_r_s_t_0__c_h_a_r_a_c_t_e_r) shall be numbered from zero instead of one. The -_p_o_s_2 option shall specify the first character after the sort field instead of the last character in the sort field. (Therefore, _f_i_e_l_d_0__n_u_m_b_e_r and _f_i_r_s_t_0__c_h_a_r_a_c_t_e_r shall be interpreted as nonnegative, instead of positive, decimal integers and there is no need for a specification of a _l_a_s_t__c_h_a_r_a_c_t_e_r-like form.) The optional type modifiers shall be the same in both forms. If ._f_i_r_s_t_0__c_h_a_r_a_c_t_e_r is omitted or _f_i_r_s_t_0__c_h_a_r_a_c_t_e_r evaluates to zero, it shall refer to the first character of the field. Thus, a the fully specified +_p_o_s_1 -_p_o_s_2 form: +_w._x -_y._z shall be equivalent to: -k _w+1._x+1,_y.0 (if _z == 0) -k _w+1._x+1,_y+1._z (if _z > 0) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.58 sort - Sort, merge, or sequence check text files 721 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX As with the nonobsolescent forms, implementations shall support at least nine occurrences of the +_p_o_s_1 option, which shall be significant in command line order. 4.58.8 Exit Status The sort utility shall exit with one of the following values: 0 All input files were output successfully, or -c was specified and the input file was correctly sorted. 1 Under the -c option, the file was not ordered as specified, or if the -c and -u options were both specified, two input lines were found with equal keys. This exit status shall not be returned if the -c option is not used. >1 An error occurred. 4.58.9 Consequences of Errors Default. BEGIN_RATIONALE 4.58.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e In the following examples, nonobsolescent and obsolescent ways of specifying sort keys are given as an aid to understanding the relationship between the two forms. Either of the following commands sorts the contents of infile with the second field as the sort key: sort -k 2,2 infile sort +1 -2 infile Either of the following commands sorts, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator): sort -r -o outfile -k 2.2,2.2 infile1 infile2 1 sort -r -o outfile +1.1 -1.2 infile1 infile2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 722 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Either of the following commands sorts the contents of infile1 and infile2 using the second non- character of the second field as the sort key: sort -k 2.2b,2.2b infile1 infile2 sort +1.1b -1.2b infile1 infile2 Either of the following commands prints the System V password file (user database) sorted by the numeric user ID (the third colon-separated field): sort -t : -k 3,3n /etc/passwd sort -t : +2 -3n /etc/passwd Either of the following commands prints the lines of the already sorted file infile, suppressing all but one occurrence of lines having the same third field: sort -um -k 3.1,3.0 infile sort -um +2.0 -3.0 infile Examples in some historical documentation state that options -um with one 1 input file keep the first in each set of lines with equal keys. This 2 behavior was deemed to be an implementation artifact and was not made 1 standard. 1 The default value for -t, , has different properties than, for example, -t "". If a line contains: foo the following treatment would occur with default separation versus specifically selecting a : Field Default -t "" _____ _________________ ____________ 1 foo _e_m_p_t_y 2 _e_m_p_t_y _e_m_p_t_y 1 3 _e_m_p_t_y foo 1 The leading field separator itself is included in a field when -t is not 1 used. For example, this command returns an exit status of zero, meaning 1 the input was already sorted: 1 sort -c -k 2 <b 1 xa 1 eof 1 (assuming that precedes in the current collating sequence). 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.58 sort - Sort, merge, or sequence check text files 723 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The field separator is not included in a field when it is explicitly set 1 via -t. This is historical practice and allows usage such as 1 sort -t "|" -k 2n <s are tolerated in doing the comparison. If -b is enabled, rather than implied, by -n, this has unusual side effects. When a character offset is used into a column of numbers (e.g., to sort mod 100), that offset Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 724 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 will be measured relative to the most significant digit, not to the column. Based upon a recommendation of the author of the original sort utility, the -b implication has been omitted from POSIX.2 and an application wishing to achieve the previously mentioned side effects will have to manually code the -b flag. END_RATIONALE 4.59 stty - Set the options for a terminal 4.59.1 Synopsis stty [ -a | -g ] stty _o_p_e_r_a_n_d_s 4.59.2 Description The stty utility shall set or report on terminal I/O characteristics for the device that is its standard input. Without options or operands specified, it shall report the settings of certain characteristics, usually those that differ from implementation-defined defaults. Otherwise, it shall modify the terminal state according to the specified operands. Detailed information about the modes listed in the first five groups below are described in POSIX.1 {8} Section 7. Operands in the Combination Modes group (see 4.59.4.6) shall be implemented using operands in the previous groups. Some combinations of operands are mutually exclusive on some terminal types; the results of using such combinations are unspecified. Typical implementations of this utility require a communications line configured to use a POSIX.1 {8} _t_e_r_m_i_o_s interface. On systems where none of these lines are available, and on lines not currently configured to support the POSIX.1 {8} termios interface, some of the operands need not affect terminal characteristics. 4.59.3 Options The stty utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 725 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -a Write to standard output all the current settings for the terminal. -g Write to standard output all the current settings in an unspecified form that can be used as arguments to another invocation of the stty utility on the same system. The form used shall not contain any characters that would require quoting to avoid word expansion by the shell; see 3.6. 4.59.4 Operands The following operands shall be supported by the implementation to set the terminal characteristics: 4.59.4.1 Control Modes parenb (-parenb) Enable (disable) parity generation and detection. This shall have the effect of setting (not setting) PARENB in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. parodd (-parodd) Select odd (even) parity. This shall have the effect of setting (not setting) PARODD in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. cs5 cs6 cs7 cs8 Select character size, if possible. This shall have the effect of setting CS5, CS6, CS7, and CS8, respectively, in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. _n_u_m_b_e_r Set terminal baud rate to the number given, if possible. If the baud rate is set to zero, the modem control lines shall no longer be asserted. This shall have the effect of setting the input and output _t_e_r_m_i_o_s baud rate values as defined in POSIX.1 {8}. ispeed _n_u_m_b_e_r Set terminal input baud rate to the number given, if possible. If the input baud rate is set to zero, the input baud rate shall be specified by the value of the output baud rate. This shall have the effect of setting the input _t_e_r_m_i_o_s baud rate values as defined in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 726 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 ospeed _n_u_m_b_e_r Set terminal output baud rate to the number given, if possible. If the output baud rate is set to zero, the modem control lines shall no longer be asserted. This shall have the effect of setting the output _t_e_r_m_i_o_s baud rate values as defined in POSIX.1 {8}. hupcl (-hupcl) Stop asserting modem control lines (do not stop asserting modem control lines) on last close. This shall have the effect of setting (not setting) HUPCL in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. hup (-hup) Same as hupcl (-hupcl). cstopb (-cstopb) Use two (one) stop bits per character. This shall have the effect of setting (not setting) CSTOPB in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. cread (-cread) Enable (disable) the receiver. This shall have the effect of setting (not setting) CREAD in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. clocal (-clocal) Assume a line without (with) modem control. This shall have the effect of setting (not setting) CLOCAL in the _t_e_r_m_i_o_s _c__c_f_l_a_g field, as defined in POSIX.1 {8}. It is unspecified whether stty shall report an error if an attempt to set a Control Mode fails. 4.59.4.2 Input Modes ignbrk (-ignbrk) Ignore (do not ignore) break on input. This shall have the effect of setting (not setting) IGNBRK in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. brkint (-brkint) Signal (do not signal) INTR on break. This shall have the effect of setting (not setting) BRKINT in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. ignpar (-ignpar) Ignore (do not ignore) bytes with parity errors. This shall have the effect of setting (not setting) IGNPAR in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 727 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX parmrk (-parmrk) Mark (do not mark) parity errors. This shall have the effect of setting (not setting) PARMRK in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. inpck (-inpck) Enable (disable) input parity checking. This shall have the effect of setting (not setting) INPCK in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. istrip (-istrip) Strip (do not strip) input characters to seven bits. This shall have the effect of setting (not setting) ISTRIP in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. inlcr (-inlcr) Map (do not map) NL to CR on input. This shall have the effect of setting (not setting) INLCR in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. igncr (-igncr) Ignore (do not ignore) CR on input. This shall have the effect of setting (not setting) IGNCR in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. icrnl (-icrnl) Map (do not map) CR to NL on input. This shall have the effect of setting (not setting) ICRNL in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. ixon (-ixon) Enable (disable) START/STOP output control. Output from the system is stopped when the system receives STOP and started when the system receives START. This shall have the effect of setting (not setting) IXON in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. ixoff (-ixoff) Request that the system send (not send) STOP characters when the input queue is nearly full and START characters to resume data transmission. This shall have the effect of setting (not setting) IXOFF in the _t_e_r_m_i_o_s _c__i_f_l_a_g field, as defined in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 728 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.59.4.3 Output Modes opost (-opost) Post-process output (do not post-process output; ignore all other output modes). This shall have the effect of setting (not setting) OPOST in the _t_e_r_m_i_o_s _c__o_f_l_a_g field, as defined in POSIX.1 {8}. 4.59.4.4 Local Modes isig (-isig) Enable (disable) the checking of characters against the special control characters INTR, QUIT, and SUSP. This shall have the effect of setting (not setting) ISIG in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. icanon (-icanon) Enable (disable) canonical input (ERASE and KILL processing). This shall have the effect of setting (not setting) ICANON in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. iexten (-iexten) Enable (disable) any implementation-defined special control characters not currently controlled by icanon, isig, ixon, or ixoff. This shall have the effect of setting (not setting) IEXTEN in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. echo (-echo) Echo back (do not echo back) every character typed. This shall have the effect of setting (not setting) ECHO in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. echoe (-echoe) The ERASE character shall (shall not) visually erase the last character in the current line from the display, if possible. This shall have the effect of setting (not setting) ECHOE in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. echok (-echok) Echo (do not echo) NL after KILL character. This shall have the effect of setting (not setting) ECHOK in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. echonl (-echonl) Echo (do not echo) NL, even if echo is disabled. This shall have the effect of setting (not setting) ECHONL in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 729 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX noflsh (-noflsh) Disable (enable) flush after INTR, QUIT, SUSP. This shall have the effect of setting (not setting) NOFLSH in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in POSIX.1 {8}. tostop (-tostop) Send SIGTTOU for background output. This shall 2 have the effect of setting (not setting) TOSTOP 2 in the _t_e_r_m_i_o_s _c__l_f_l_a_g field, as defined in 2 POSIX.1 {8}. 2 NOTE: Setting TOSTOP has no effect on systems 2 not supporting the POSIX.1 {8} job control 2 option. 2 4.59.4.5 Special Control Character Assignments _c_o_n_t_r_o_l-_c_h_a_r_a_c_t_e_r _s_t_r_i_n_g Set _c_o_n_t_r_o_l-_c_h_a_r_a_c_t_e_r to _s_t_r_i_n_g. If _c_o_n_t_r_o_l- _c_h_a_r_a_c_t_e_r is one of the character sequences in the first column of Table 4-9, the corresponding POSIX.1 {8} control character from the second column shall be recognized. This shall have the effect of setting the corresponding element of the _t_e_r_m_i_o_s _c__c_c array (see POSIX.1 {8} 7.1.2). Table 4-9 - stty Control Character Names __________________________________________________________________________________________________________________________________________________ __cccc__oooo__nnnn__tttt__rrrr__oooo__llll__----__cccc__hhhh__aaaa__rrrr__aaaa__cccc__tttt__eeee__rrrr__________P_O_S_I_X_._1__{_8_}__S_u_b_s_c_r_i_p_t_____________D_e_s_c_r_i_p_t_i_o_n___ eof VEOF EOF character eol VEOL EOL character erase VERASE ERASE character intr VINTR INTR character kill VKILL KILL character quit VQUIT QUIT character susp VSUSP SUSP character start VSTART START character stop VSTOP STOP character __________________________________________________________________________________________________________________________________________________ If _s_t_r_i_n_g is a single character, the control character shall be set to that character. If _s_t_r_i_n_g is the two-character sequence "^-" or the string "undef", the control character shall be set to {_POSIX_VDISABLE}, if it is in effect Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 730 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 for the device; if {_POSIX_VDISABLE} is not in effect for the device, it shall be treated as an error. In the POSIX Locale, if _s_t_r_i_n_g is a two-character sequence beginning with circumflex (^), and the second character is one of those listed in the ^_c column of Table 4-10, the control character shall be set to the corresponding character value in the Value column of the table. Table 4-10 - stty Circumflex Control Characters __________________________________________________________________________________________________________________________________________________ ^_cccc Value ^_cccc Value ^_cccc Value _________________________________________________________________________ a, A l, L w, W b, B m, M x, X c, C n, N y, Y d, D o, O z, Z e, E p, P [ f, F q, Q \ g, G r, R ] h, H s, S ^ i, I t, T _ j, J u, U ? k, K v, V __________________________________________________________________________________________________________________________________________________ min _n_u_m_b_e_r time _n_u_m_b_e_r Set the value of min or time to _n_u_m_b_e_r. MIN and TIME are used in noncanonical mode input processing (-icanon). 4.59.4.6 Combination Modes _s_a_v_e_d _s_e_t_t_i_n_g_s Set the current terminal characteristics to the saved settings produced by the -g option. evenp or parity Enable parenb and cs7; disable parodd. oddp Enable parenb, cs7, and parodd. -parity, -evenp, or -oddp Disable parenb, and set cs8. nl (-nl) Enable (disable) icrnl. In addition, -nl unsets inlcr and igncr. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 731 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ek Reset ERASE and KILL characters back to system defaults. sane Reset all modes to some reasonable, unspecified, values. 4.59.5 External Influences 4.59.5.1 Standard Input Although no input is read from standard input, standard input is used to get the current terminal I/O characteristics and to set new terminal I/O characteristics. 4.59.5.2 Input Files None. 4.59.5.3 Environment Variables The following environment variables shall affect the execution of stty: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and which characters are in the class print. LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 732 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.59.5.4 Asynchronous Events Default. 4.59.6 External Effects 4.59.6.1 Standard Output If operands are specified, no output shall be produced. If the -g option is specified, stty shall write to standard output the current settings in a form that can be used as arguments to another instance of stty on the same system. If the -a option is specified, all of the information as described in 4.59.4 shall be written to standard output. Unless otherwise specified, this information shall be written as -separated tokens in an unspecified format, on one or more lines, with an unspecified number of tokens per line. Additional information may be written. If no options or operands are specified, an unspecified subset of the information written for the -a option shall be written. If speed information is written as part of the default output, or if the -a option is specified and if the terminal input speed and output speed are the same, the speed information shall be written as follows: "speed %d baud;", <_s_p_e_e_d> Otherwise, speeds shall be written as: "ispeed %d baud; ospeed %d baud;", <_i_s_p_e_e_d>, <_o_s_p_e_e_d> In locales other than the POSIX Locale, the word baud may be changed to something more appropriate in those locales. If control characters are written as part of the default output, or if the -a option is specified, control characters shall be written as: "%s = %s;", <_c_o_n_t_r_o_l-_c_h_a_r_a_c_t_e_r _n_a_m_e>, <_v_a_l_u_e> where _v_a_l_u_e is either the character, or some visual representation of the character if it is nonprintable, or the string if the character is disabled. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 733 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.59.6.2 Standard Error Used only for diagnostic messages. 4.59.6.3 Output Files None. 4.59.7 Extended Description None. 4.59.8 Exit Status The stty utility shall exit with one of the following values: 0 The terminal options were read or set successfully. >0 An error occurred. 4.59.9 Consequences of Errors Default. BEGIN_RATIONALE 4.59.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Since POSIX.1 {8} doesn't specify any output modes, they are not specified in this standard either. Implementations are expected to provide stty operands corresponding to all of the output modes they support. In many ways outside the scope of POSIX.2, stty is primarily used to tailor the user interface of the terminal, such as selecting the preferred ERASE and KILL characters. As an application programming utility, stty can be used within shell scripts to alter the terminal settings for the duration of the script. The -g flag is designed to facilitate the saving and restoring of terminal state from the shell level. For example, a program may: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 734 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 saveterm="$(stty -g)" # save terminal state stty (_n_e_w _s_e_t_t_i_n_g_s) # _s_e_t _n_e_w _s_t_a_t_e ... # ... stty $saveterm # restore terminal state Since the format is unspecified, the saved value is not portable across systems. Since the -a format is so loosely specified, scripts that save and restore terminal settings should use the -g option. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The original stty manual page was taken directly from System V and reflected the System V terminal driver _t_e_r_m_i_o. It has been modified to correspond to the POSIX.1 {8} terminal driver _t_e_r_m_i_o_s. The _t_e_r_m_i_o_s section states that individual disabling of control characters is an option {_POSIX_VDISABLE}. If enabled, two conventions currently exist for specifying this: System V uses "^-", and BSD uses undef. Both are accepted by POSIX.2 stty. The other BSD convention of using the letter u was rejected because it conflicts with the actual letter u, which is an acceptable value for a control character. Early drafts did not specify the mapping of ^_c to control characters because the control characters were not specified in the POSIX Locale character set description file requirements. The control character set is now specified in 2.4.1, so the traditional mapping is specified. Note that although the mapping corresponds to control-character key assignments on many terminals that use ISO/IEC 646 {1} (or ASCII) character encodings, the mapping specified here is to the control characters, not their keyboard encodings. The combination options raw and cooked (-raw) were dropped from the standard because the exact values that should be set are not well understood or commonly agreed on. In particular, _t_e_r_m_i_o_s has no explicit RAW bit, and the options that should be re-enabled (-raw) _a_r_e _n_o_t _c_l_e_a_r. _G_e_n_e_r_a_l _p_r_o_g_r_a_m_m_i_n_g _p_r_a_c_t_i_c_e _i_s _t_o _s_a_v_e _t_h_e _t_e_r_m_i_n_a_l _s_t_a_t_e, _c_h_a_n_g_e _t_h_e _s_e_t_t_i_n_g_s _f_o_r _t_h_e _d_u_r_a_t_i_o_n _o_f _t_h_e _p_r_o_g_r_a_m, _a_n_d _t_h_e_n _r_e_s_e_t _t_h_e _s_t_a_t_e. _T_h_i_s _i_s _e_a_s_y _t_o _d_o _w_i_t_h_i_n _a _C _p_r_o_g_r_a_m, _h_o_w_e_v_e_r _i_t _i_s _n_o_t _p_o_s_s_i_b_l_e _f_o_r _a _s_i_n_g_l_e _i_n_v_o_c_a_t_i_o_n _o_f _s_t_t_y to restore the terminal state (-raw) without knowledge of the prior settings. Using the -g option and two calls to stty, a shell application could do this as described above. However, it is impossible to implement this as a single option. Also, it is not clear that changing word size and parity is appropriate. For example, requiring that cooked set cs7 and parenb would be disastrous for users working with 8-bit international character sets. In general, these options are too ill-defined to be of any use. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.59 stty - Set the options for a terminal 735 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Since _t_e_r_m_i_o_s supports separate speeds for input and output, two new options were added to specify each distinctly. The ixany input mode was removed from Draft 8 on the basis that it could not be implemented on a POSIX.1 {8} system without extensions. Some historical implementations use standard input to get and set terminal characteristics; others use standard output. Since input from a login TTY is usually restricted to the owner while output to a TTY is frequently open to the world, using standard input provides fewer chances of accidentally (or mischievously) altering the terminal settings of other users. Using standard input also allows stty -a and stty -g output to be redirected for later use. Therefore, usage of standard input is required by this standard. The tostop option was omitted from early drafts through an oversight. It 2 is the only option that requires job control to be effective, and thus 2 could have gone into the UPE as a modification to stty, but since all 2 other terminal control features are in the base standard, tostop was 2 included as well. 2 END_RATIONALE 2 4.60 tail - Copy the last part of a file 4.60.1 Synopsis tail [-f] [ -c _n_u_m_b_e_r | -n _n_u_m_b_e_r ] [_f_i_l_e] _O_b_s_o_l_e_s_c_e_n_t _v_e_r_s_i_o_n_s: tail -[_n_u_m_b_e_r][c|l][f] [_f_i_l_e] tail +[_n_u_m_b_e_r][c|l][f] [_f_i_l_e] 4.60.2 Description The tail utility shall copy its input file to the standard output beginning at a designated place. Copying shall begin at the point in the file indicated by the -c _n_u_m_b_e_r or -n _n_u_m_b_e_r options (or the +__n_u_m_b_e_r portion of the argument to the obsolescent version). The option-argument _n_u_m_b_e_r shall be counted in units of lines or bytes, according to the options -n and -c (or, in the obsolescent version, the appended option suffixes l or c). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 736 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 Tails relative to the end of the file may be saved in an internal buffer, and thus may be limited in length. Implementations shall ensure that such a buffer, if any, is no smaller than {LINE_MAX}*10 bytes. 4.60.3 Options The tail utility shall conform to the utility argument syntax guidelines described in standard described in 2.10.2, except that the obsolescent version accepts multicharacter options that can preceded by a plus sign. The following options shall be supported by the implementation in the nonobsolescent version: -c _n_u_m_b_e_r The _n_u_m_b_e_r option-argument shall be a decimal integer whose sign affects the location in the file, measured in bytes, to begin the copying: Sign Copying Starts ____ ______________________________________ + Relative to the beginning of the file. - Relative to the end of the file. _n_o_n_e Relative to the end of the file. The origin for counting shall be 1; i.e., -c +1 represents 1 the first byte of the file, -c -1 the last. 1 -f If the input file is a regular file or if the _f_i_l_e operand specifies a FIFO, do not terminate after the last line of the input file has been copied, but read and copy further bytes from the input file when they become available. If no _f_i_l_e operand is specified and standard input is a pipe, the -f option shall be ignored. If the input file is not a FIFO, pipe, or regular file, it is unspecified whether or not the -f option shall be ignored. -n _n_u_m_b_e_r This option shall be equivalent to -c _n_u_m_b_e_r, except the starting location in the file shall be measured in lines instead of bytes. The origin for counting shall be 1; 1 i.e., -n +1 represents the first line of the file, -n -1 1 the last. 1 In the obsolescent version, an argument beginning with a - or + can be used as a single option. The argument +__n_u_m_b_e_r with the letter c specified as a suffix shall be equivalent to -c +__n_u_m_b_e_r; +__n_u_m_b_e_r with the letter l specified as a suffix, or with neither c nor l as a suffix, shall be equivalent to -n +__n_u_m_b_e_r. If _n_u_m_b_e_r is not specified in these forms, 10 shall be used. The letter f specified as a suffix shall be equivalent to specifying the -f option. If the -[_n_u_m_b_e_r]c[f] form is used and neither _n_u_m_b_e_r nor the f suffix is specified, it shall be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.60 tail - Copy the last part of a file 737 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX interpreted as the -c _n_u_m_b_e_r option. In the nonobsolescent form, if neither -c nor -n is specified, -n 10 shall be assumed. 4.60.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If no _f_i_l_e operands are specified, the standard input shall be used. 4.60.5 External Influences 4.60.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.60.5.2 Input Files If the -c option is specified, the input file can contain arbitrary data; otherwise, the input file shall be a text file. 4.60.5.3 Environment Variables The following environment variables shall affect the execution of tail: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 738 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_MESSAGES This variable shall determine the language in which messages should be written. 4.60.5.4 Asynchronous Events Default. 4.60.6 External Effects 4.60.6.1 Standard Output The designated portion of the input file shall be written to standard output. 4.60.6.2 Standard Error Used only for diagnostic messages. 4.60.6.3 Output Files None. 4.60.7 Extended Description None. 4.60.8 Exit Status The tail utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.60.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.60 tail - Copy the last part of a file 739 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.60.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _U_s_a_g_e_,__E_x_a_m_p_l_e_s The nonobsolescent version of tail was created to allow conformance to the Utility Syntax Guidelines. The historical -b option was omitted because of the general nonportability of block-sized units of text. The -c option historically meant ``characters,'' but this standard indicates that it means ``bytes.'' This was selected to allow reasonable implementations when multibyte characters are possible; it was not named -b to avoid confusion with the historical -b. Note that the -c option should be used with caution when the input is a text file containing multibyte characters; it may produce output that does not start on a character boundary. The origin of counting both lines and bytes is 1, matching all widespread 1 historical implementations. 1 The restriction on the internal buffer is a compromise between the historical System V implementation of 4K and the BSD 32K. The -f option can be used to monitor the growth of a file that is being written by some other process. For example, the command: tail -f fred prints the last ten lines of the file fred, followed by any lines that are appended to fred between the time tail is initiated and killed. As another example, the command: tail -f -c 15 fred prints the last 15 bytes of the file fred, followed by any bytes that are appended to fred between the time tail is initiated and killed. Although the input file to tail can be any type, the results need not be what would be expected on some character special device files or on file types not described by POSIX.1 {8}. Since the standard does not specify the block size used when doing input, tail need not read all of the data from devices that only perform block transfers. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The developers of the standard originally decided that tail, and its frequent companion, head, were useful mostly to interactive users, and not application programs. However, balloting input suggested that these utilities actually do find significant use in scripts, such as to write out portions of log files. The balloters also challenged the working Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 740 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 group's assumption that clever use of sed could be an appropriate substitute for tail. The -f option has been implemented as a loop that sleeps for one second and copies any bytes that are available. This is sufficient, but if more efficient methods of determining when new data are available are developed, implementations are encouraged to use them. Historical documentation says that tail ignores the -f option if the input file is a pipe (pipe and FIFO on systems that support FIFOs). On BSD-based systems, this has been true; on System V-based systems, this was true when input was taken from standard input, but behaved as on other files if a FIFO was named as the _f_i_l_e operand. Since the -f option is not useful on pipes and all historical implementations ignore -f if no _f_i_l_e operand is specified and standard input is a pipe, POSIX.2 requires this behavior. However, since the -f option is useful on a FIFO, POSIX.2 also requires that if standard input is a FIFO or a FIFO is named, the -f option shall not be ignored. Although historical behavior does not ignore the -f option for other file types, this is unspecified so that implementations are allowed to ignore the -f option if it is known that the file cannot be extended. An earlier draft had the synopsis line: tail [ -c | -l ] [-f] [-n _n_u_m_b_e_r] [_f_i_l_e] This was changed to the current form based on comments and objections noting that -c was almost never used without specifying a number and there was no need to specify -l if -n _n_u_m_b_e_r was given. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.60 tail - Copy the last part of a file 741 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.61 tee - Duplicate standard input 4.61.1 Synopsis tee [-ai] [_f_i_l_e ...] 4.61.2 Description The tee utility shall copy standard input to standard output, making a copy in zero or more files. The tee utility shall not buffer output. The options determine if the specified files are overwritten or appended to. 4.61.3 Options The tee utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -a Append the output to the files rather than overwriting them. -i Ignore the SIGINT signal. 4.61.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of an output file. Implementations shall support processing of at least 13 _f_i_l_e operands. 4.61.5 External Influences 4.61.5.1 Standard Input The standard input can be of any type. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 742 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.61.5.2 Input Files None. 4.61.5.3 Environment Variables The following environment variables shall affect the execution of tee: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.61.5.4 Asynchronous Events Default, except that if the -i option was specified, SIGINT shall be ignored. 4.61.6 External Effects 4.61.6.1 Standard Output The standard output shall be a copy of the standard input. 4.61.6.2 Standard Error Used only for diagnostic messages. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.61 tee - Duplicate standard input 743 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.61.6.3 Output Files If any _f_i_l_e operands are specified, the standard input shall be copied to each named file. 4.61.7 Extended Description None. 4.61.8 Exit Status 0 The standard input was successfully copied to all output files. >0 An error occurred. 4.61.9 Consequences of Errors If a write to any successfully opened _f_i_l_e operand fails, writes to other successfully opened _f_i_l_e operands and standard output shall continue, but the exit status shall be nonzero. Otherwise, the default actions specified in 2.11.9 shall apply. BEGIN_RATIONALE 4.61.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The tee utility is usually used in a pipeline, to make a copy of the output of some utility. The _f_i_l_e operand is technically optional, but tee is no more useful than cat when none is specified. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The buffering requirement means that tee is not allowed to use C Standard {7} fully-buffered or line-buffered writes, not that tee has to do one-byte reads followed by one-byte writes. It should be noted that early versions of BSD silently ignore any invalid options, and accept a single - as an alternative to -i. They also print the message Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 744 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 "tee: cannot access %s\n", <_p_a_t_h_n_a_m_e> if unable to open a file. Historical implementations ignore write errors. This is explicitly not permitted by this standard. Some historical implementations use O_APPEND when providing append mode; others just _l_s_e_e_k() to the end of file after opening the file without O_APPEND. This standard requires functionality equivalent to using O_APPEND; see 2.9.1.4. END_RATIONALE 4.62 test - Evaluate expression 4.62.1 Synopsis test [_e_x_p_r_e_s_s_i_o_n] [ [_e_x_p_r_e_s_s_i_o_n] ] 4.62.2 Description The test utility shall evaluate the _e_x_p_r_e_s_s_i_o_n and indicate the result of 1 the evaluation by its exit status. An exit status of zero indicates that 1 the expression evaluated as true and an exit status of 1 indicates that 1 the expression evaluated as false. 1 In the second form of the utility, which uses [ ], rather than test, the square brackets shall be separate arguments. 4.62.3 Options The test utility shall not recognize the -- argument in the manner specified by utility syntax guideline 10 in 2.10.2. Implementations shall not support any options. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 745 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.62.4 Operands All operators and elements of primaries shall be presented as separate 2 arguments to the test utility. The following primaries can be used to construct _e_x_p_r_e_s_s_i_o_n: -b _f_i_l_e True if _f_i_l_e exists and is a block special file. -c _f_i_l_e True if _f_i_l_e exists and is a character special file. -d _f_i_l_e True if _f_i_l_e exists and is a directory. -e _f_i_l_e True if _f_i_l_e exists. -f _f_i_l_e True if _f_i_l_e exists and is a regular file. -g _f_i_l_e True if _f_i_l_e exists and its set group ID flag is set. -n _s_t_r_i_n_g True if the length of _s_t_r_i_n_g is nonzero. -p _f_i_l_e True if _f_i_l_e is a named pipe (FIFO). -r _f_i_l_e True if _f_i_l_e exists and is readable. -s _f_i_l_e True if _f_i_l_e exists and has a size greater than zero. -t _f_i_l_e__d_e_s_c_r_i_p_t_o_r True if the file whose file descriptor number is _f_i_l_e__d_e_s_c_r_i_p_t_o_r is open and is associated with a terminal. -u _f_i_l_e True if _f_i_l_e exists and its set-user-ID flag is set. -w _f_i_l_e True if _f_i_l_e exists and is writable. True shall indicate only that the write flag is on. The _f_i_l_e shall not be writable on a read-only file system even if this test indicates true. -x _f_i_l_e True if _f_i_l_e exists and is executable. True shall indicate only that the execute flag is on. If _f_i_l_e is a directory, true indicates that _f_i_l_e can be searched. -z _s_t_r_i_n_g True if the length of string _s_t_r_i_n_g is zero. _s_t_r_i_n_g True if the string _s_t_r_i_n_g is not the null string. _s_1 = _s_2 True if the strings _s_1 and _s_2 are identical. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 746 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _s_1 != _s_2 True if the strings _s_1 and _s_2 are not identical. _n_1 -_e_q _n_2 True if the integers _n_1 and _n_2 are algebraically equal. _n_1 -_n_e _n_2 True if the integers _n_1 and _n_2 are not algebraically equal. _n_1 -_g_t _n_2 True if the integer _n_1 is algebraically greater than the integer _n_2. _n_1 -_g_e _n_2 True if the integer _n_1 is algebraically greater than or equal to the integer _n_2. _n_1 -_l_t _n_2 True if the integer _n_1 is algebraically less than the integer _n_2. _n_1 -_l_e _n_2 True if the integer _n_1 is algebraically less than or equal to the integer _n_2. A primary can be preceded by the ! operator to complement its test, as 1 described below. 1 The primaries with two elements of the form: 2 -_p_r_i_m_a_r_y__o_p_e_r_a_t_o_r _p_r_i_m_a_r_y__o_p_e_r_a_n_d 2 are known as _u_n_a_r_y _p_r_i_m_a_r_i_e_s. The primaries with three elements in 2 either of the two forms: 2 _p_r_i_m_a_r_y__o_p_e_r_a_n_d -_p_r_i_m_a_r_y__o_p_e_r_a_t_o_r _p_r_i_m_a_r_y__o_p_e_r_a_n_d 2 _p_r_i_m_a_r_y__o_p_e_r_a_n_d _p_r_i_m_a_r_y__o_p_e_r_a_t_o_r _p_r_i_m_a_r_y__o_p_e_r_a_n_d 2 are known as _b_i_n_a_r_y _p_r_i_m_a_r_i_e_s. Additional implementation-defined 2 operators and _p_r_i_m_a_r_y__o_p_e_r_a_t_o_rs may be provided by implementations. They 2 shall be of the form -_o_p_e_r_a_t_o_r where the first character of _o_p_e_r_a_t_o_r is 2 not a digit. The additional implementation-defined operators ``('' and 2 ``)'' may also be provided by implementations. 2 The algorithm for determining the precedence of the operators and the 1 return value that shall be generated is based on the number of arguments 1 presented to test. (However, when using the [...] form, the right- 1 bracket final argument shall not be counted in this algorithm.) In the 1 following list, $1, $2, $3, and $4 represent the arguments presented to 1 test. 1 0 arguments: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 747 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Exit false (1). 1 1 argument: 1 Exit true (0) if $1 is not null; otherwise, exit false. 1 2 arguments: 1 - If $1 is !, exit true if $2 is null, false if $2 is not null. 1 - If $1 is a unary primary, exit true if the unary test is 2 true, false if the unary test is false. 1 - Otherwise, produce unspecified results. 1 3 arguments: 1 - If $2 is a binary primary, perform the binary test of $1 and 2 $3. 2 - If $1 is !, negate the two-argument test of $2 and $3. 1 - Otherwise, produce unspecified results. 1 4 arguments: 1 - If $1 is !, negate the three-argument test of $2, $3, and $4. 1 - Otherwise, the results are unspecified. 1 >4 arguments: 1 The results are unspecified. 1 4.62.5 External Influences 4.62.5.1 Standard Input None. 4.62.5.2 Input Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 748 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.62.5.3 Environment Variables The following environment variables shall affect the execution of test: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.62.5.4 Asynchronous Events Default. 4.62.6 External Effects 4.62.6.1 Standard Output None. 4.62.6.2 Standard Error Used only for diagnostic messages. 4.62.6.3 Output Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 749 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.62.7 Extended Description None. 4.62.8 Exit Status The test utility shall exit with one of the following values: 0 _e_x_p_r_e_s_s_i_o_n evaluated to true. 1 _e_x_p_r_e_s_s_i_o_n evaluated to false or _e_x_p_r_e_s_s_i_o_n was missing. >1 An error occurred. 4.62.9 Consequences of Errors Default. BEGIN_RATIONALE 4.62.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _r_a_t_i_o_n_a_l_e _h_a_s _b_e_e_n _r_e_a_r_r_a_n_g_e_d _q_u_i_t_e _a _b_i_t. _O_n_l_y _n_e_w, 1 _n_o_t _m_o_v_e_d, _t_e_x_t _h_a_s _b_e_e_n _d_i_f_f_m_a_r_k_e_d. 1 Historical systems have supported more than four arguments, but there has 1 been a fundamental disagreement between BSD and System V on certain 1 combinations of arguments. Since no accommodation could be reached 1 between the two versions of test without breaking numerous applications, 1 the version of test in POSIX.2 specifies only the relatively simple tests 1 and relies on the syntax of the shell command language for the 1 construction of more complex expressions. Using the POSIX.2 rules 1 produces completely reliable, portable scripts, which is not always 1 possible using either of the historical forms. Some of the historical 1 behavior is described here to aid conversion of scripts with complex test 1 expressions. 1 Both BSD and System V support the combining of primaries with the 1 following constructs: 1 _e_x_p_r_e_s_s_i_o_n_1 -_a _e_x_p_r_e_s_s_i_o_n_2 True if both _e_x_p_r_e_s_s_i_o_n_1 and _e_x_p_r_e_s_s_i_o_n_2 1 are true. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 750 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _e_x_p_r_e_s_s_i_o_n_1 -_o _e_x_p_r_e_s_s_i_o_n_2 True if at least one of _e_x_p_r_e_s_s_i_o_n_1 and 1 _e_x_p_r_e_s_s_i_o_n_2 are true. 1 ( _e_x_p_r_e_s_s_i_o_n ) True if _e_x_p_r_e_s_s_i_o_n is true. 1 In evaluating these more complex combined expressions, the following 1 precedence rules are used: 1 - The unary primaries have higher precedence than the algebraic 1 binary primaries. 1 - On BSD systems, the unary primaries have higher precedence than the 1 string binary primaries. On System V systems, the unary primaries 1 have lower precedence than the string binary primaries. 1 - The unary and binary primaries have higher precedence than the 1 unary _s_t_r_i_n_g primary. 1 - The ! operator has higher precedence than the -a operator and the 1 -a operator has higher precedence than the -o operator. 1 - The -a and -o operators are left associative. 1 - The parentheses can be used to alter the normal precedence and 1 associativity. 1 The following guidance is offered for the use of the historical 1 expressions: 1 - Scripts should be careful when dealing with user-supplied input 1 that could be confused with primaries and operators. Unless the application writer knows all the cases that produce input to the script, invocations like: test "$1" -a "$2" should be written as: test "$1" && test "$2" 1 to avoid problems if a user-supplied values such as $1 set to ! and $2 set to the null string. That is, in cases where portability between implementations based on BSD and System V systems is of concern, replace: test expr1 -a expr2 with: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 751 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX test expr1 && test expr2 and replace: test expr1 -o expr2 with: test expr1 || test expr2 but note that, in test, -a has higher precedence than -o while && and || have equal precedence in the shell. Parentheses or braces can be used in the shell command language to 1 effect grouping. Historical test implementations also support 1 parentheses, but they must be escaped when using sh; for example: 1 test \( expr1 -a expr2 \) -o expr3 1 This command is not always portable. The following form can be 1 used instead: 1 ( test expr1 && test expr2 ) || test expr3 1 - The two commands: 1 test "$1" 1 test ! "$1" 1 could not be used reliably on historical systems. Unexpected 1 results would occur if such a _s_t_r_i_n_g expression were used and $1 1 expanded to !, (, or a known unary primary. Better constructs 1 were: 1 test -n "$1" 1 test -z "$1" 1 respectively. These suggested replacements have always worked on 1 historical BSD-based implementations, and work on historical 1 System V-based implementations as long as $1 does not expand to = 1 or !=. Using the POSIX.2 rules, any of the four forms shown will 1 work for any possible value of $1. 1 - Historical systems were also unreliable given the common construct: 1 test "$response" = "expected string" 1 One of the following was a more reliable form: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 752 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 test "X$response" = "Xexpected string" test "expected string" = "$response" Note that the second form assumes that expected string could not be confused with any any unary primary. If expected string starts with -, (, !, or even =, the first form should be used instead. Using the POSIX.2 rules, any of the three comparison forms is reliable, given any input. (However, note that the strings are quoted in all cases.) The BSD and System V versions of -f are not the same. The BSD definition was: -f _f_i_l_e True if _f_i_l_e exists and is not a directory. The _S_V_I_D version (true if the file exists and is a regular file) was chosen for this standard because its use is consistent with the -b, -c, -d, and -p operands (_f_i_l_e exists and is a specific file type). The -e primary, possessing similar functionality to that provided by the C-shell, was added because it provides the only way for a shell script to find out if a file exists without trying to open the file. (Since implementations are allowed to add additional file types, a portable script cannot use: test -b foo -o -c foo -o -d foo -o -f foo -o -p foo to find out if foo is an existing file.) On historical BSD systems, the existence of a file could be determined by: test -f foo -o -d foo but there was no easy way to determine that an existing file was a regular file. An earlier draft used the KornShell -a primary (with the same meaning), but this was changed to -e because there were concerns about the high probability of humans confusing the -a primary with the -a binary operator. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The -a and -o binary operators and the grouping parentheses were omitted 1 from POSIX.2 due to a difference between existing implementations of the 1 test utility in the precedence of the binary primaries = and != compared 1 to the unary primaries -b, -c, -d, -f, -g, -n, -p, -r, -s, -t, -u, -w, 1 -x, and -z. On BSD, Version 7, PWB, and 32V systems the unary primaries have higher precedence than the binary operators; on System III and System V implementations, the binary operators = and != have higher precedence. The change was apparently made for System III so that the construct: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 753 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX test "$1" = "$2" could be made to work even if $1 started with -. It is believed that this change was a mistake because: - It is not a complete solution; if $1 expands to ( or !, it still will not work. - It makes it impossible to use the unary primaries -n and -z to test for a null string if there is any chance that the string will expand to =. - More importantly, there was the well known workaround of specifying: test X"$1" = X"$2" that always worked. Unfortunately, when the = and != binary primaries were given precedence over the unary primaries, there was no workaround provided for scripts that wanted to reliably specify something like: test -n "$1" because if $1 expands to =, it gives a syntax error. There was some discussion of outlawing the System V behavior and 1 requiring the more logical precedence that originated in its predecessors 1 and remains in BSD-based systems. However, there are simply too many 1 historical applications that would break if System V were required to 1 make this change; this number dwarfed the number of scripts using 1 combination logic that would then no longer be strictly portable. 1 POSIX.2 requires that if test is called with one, two, three, or four 1 operands it correctly interprets the expression even if there is an 1 alternate syntax tree that could lead to a syntax error. It eliminates 1 the requirement that many string comparisons be protected with leading 1 characters, such as 1 test X"$1" = X"$2" 1 and allows the single-argument _s_t_r_i_n_g form to be used with all possible 1 inputs. 1 The following examples show some of the changes that are required to be made to make historical BSD and System V-based implementations of test conform to this standard: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 754 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 test -d = POSIX.2 True if there is a directory named = BSD True if there is a directory named = System V Syntax error; = needs two operands test -d = -f POSIX.2 False BSD Syntax error; it expects -a or -o after -d = System V False Implementations are prohibited from extending test with options because 1 it would make the ``test _s_t_r_i_n_g'' case ambiguous for inputs that might 1 match an extended option. Implementations can add primaries and 1 operators, as indicated. 1 The following options were not included in POSIX.2, although they are provided by some historical implementations, since these facilities and concepts are not supported by POSIX.1 {8}, nor defined in POSIX.2. These operands should not be used by new implementations for other purposes. -h _f_i_l_e True if _f_i_l_e exists and is a symbolic link. -k _f_i_l_e True if _f_i_l_e exists and its sticky bit is set. -L _f_i_l_e True if _f_i_l_e is a symbolic link. 1 -C _f_i_l_e True if _f_i_l_e is a contiguous file. 1 -S _f_i_l_e True if _f_i_l_e is a socket. 1 -V _f_i_l_e True if _f_i_l_e is a version file. 1 The following option was not included because it was undocumented in most implementations, has been removed from some implementations (including System V), and the functionality is provided by the shell (see 3.6.2). -l _s_t_r_i_n_g The length of the string _s_t_r_i_n_g. The -b, -c, -g, -p, -u, and -x operands are derived from the _S_V_I_D; historical BSD does not provide them. The -k operand is derived from System V; historical BSD does not provide it. On historical BSD systems, test -w _d_i_r_e_c_t_o_r_y always returned false 1 because test tried to open the directory for writing, which always fails. 1 Some additional primaries newly invented or from the KornShell appeared in an earlier draft as part of the Conditional Command ([[ ]]): _s_1 > _s_2, _s_1 < _s_2, _s_t_r = _p_a_t_t_e_r_n, _s_t_r != _p_a_t_t_e_r_n, _f_1 -nt _f_2, _f_1 -ot _f_2, and _f_1 -ef 1 _f_2. They were not carried forward into the test utility when the Conditional Command was removed from the shell because they have not been Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.62 test - Evaluate expression 755 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX included in the test utility built into historical implementations of the sh utility. The -t _f_i_l_e__d_e_s_c_r_i_p_t_o_r primary is shown with a mandatory argument because the grammar is ambiguous if it can be omitted. Historical implementations have allowed it to be omitted, providing a default of 1. END_RATIONALE 4.63 touch - Change file access and modification times 4.63.1 Synopsis touch [-acm] [ -r _r_e_f__f_i_l_e | -t _t_i_m_e ] _f_i_l_e ... _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: touch [-acm] [_d_a_t_e__t_i_m_e] _f_i_l_e ... 4.63.2 Description The touch utility shall change the modification and/or access times of files. The modification time is equivalent to the value of the _s_t__m_t_i_m_e member of the _s_t_a_t structure for a file, as described in POSIX.1 {8}; the access time is equivalent to the value of _s_t__a_t_i_m_e. The time used can be specified by the -t _t_i_m_e option-argument, the corresponding time field(s) of the file referenced by the -r _r_e_f__f_i_l_e option-argument, or the _d_a_t_e__t_i_m_e operand, as specified in the following subclauses. If none of these are specified, touch shall use the current time [the value returned by the equivalent of the POSIX.1 {8} _t_i_m_e() function]. For each _f_i_l_e operand, touch shall perform actions equivalent to the following functions defined in POSIX.1 {8}: (1) If _f_i_l_e does not exist, a _c_r_e_a_t() function call is made with the _f_i_l_e operand used as the _p_a_t_h argument and the value of the bitwise inclusive OR of S_IRUSR, S_IWUSR, S_IRGRP, S_IWGRP, S_IROTH, and S_IWOTH used as the _m_o_d_e argument. (2) The _u_t_i_m_e() function is called with the following arguments: (a) The _f_i_l_e operand is used as the _p_a_t_h argument. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 756 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (b) The _u_t_i_m_b_u_f structure members _a_c_t_i_m_e and _m_o_d_t_i_m_e are determined as described under 4.63.3. 4.63.3 Options The touch utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -a Change the access time of _f_i_l_e. Do not change the modification time unless -m is also specified. -c Do not create a specified _f_i_l_e if it does not exist. Do not write any diagnostic messages concerning this condition. -m Change the modification time of _f_i_l_e. Do not change the access time unless -a is also specified. -r _r_e_f__f_i_l_e Use the corresponding time of the file named by the pathname _r_e_f__f_i_l_e instead of the current time. -t _t_i_m_e Use the specified _t_i_m_e instead of the current time. The option-argument shall be a decimal number of the form: [[_C_C]_Y_Y]_M_M_D_D_h_h_m_m[._S_S] where each two digits represents the following: _M_M The month of the year (01-12). _D_D The day of the month (01-31). _h_h The hour of the day (00-23). _m_m The minute of the hour (00-59). _C_C The first two digits of the year (the century). _Y_Y The second two digits of the year. _S_S The second of the minute (00-61). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.63 touch - Change file access and modification times 757 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Both _C_C and _Y_Y shall be optional. If neither is given, the current year shall be assumed. If _Y_Y is specified, but _C_C is not, _C_C shall be derived as follows: If _Y_Y is: _C_C becomes: _________ ___________ 69-99 19 00-68 20 The resulting time shall be affected by the value of the TZ environment variable. If the resulting time value precedes the Epoch, touch shall exit immediately with an error status. The range of valid times past the Epoch is implementation defined, but shall extend to at least midnight 1 January 2000 UTC. The range for _S_S is (00-61) rather than (00-59) because of leap seconds. If _S_S is 60 or 61, and the resulting time, as affected by the TZ environment variable, does not refer to a leap second: the resulting time shall be one or two seconds after a time where _S_S is 59. If _S_S is not given a value, it is assumed to be zero. If neither the -a nor -m options were specified, touch shall behave as if both the -a and -m options were specified. 4.63.4 Operands The following operands shall be supported by the implementation: _f_i_l_e A pathname of a file whose times are to be modified. _d_a_t_e__t_i_m_e (Obsolescent.) Use the specified _d_a_t_e__t_i_m_e instead of the current time. The operand is a decimal number of the form: _M_M_D_D_h_h_m_m[_y_y] where _M_M, _D_D, _h_h, and _m_m are as described for the _t_i_m_e option-argument to the -t option and the optional _y_y is interpreted as follows: If not specified, the current year shall be used. If _y_y is in the range 69-99, the year 1969-1999, respectively, shall be used. Otherwise, the results are unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 758 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 If no -r option is specified, no -t option is specified, at least two operands are specified, and the first operand is an eight- or ten-digit decimal integer, the first operand shall be assumed to be a _d_a_t_e__t_i_m_e operand. Otherwise, the first operand shall be assumed to be a _f_i_l_e operand. 4.63.5 External Influences 4.63.5.1 Standard Input None. 4.63.5.2 Input Files None. 4.63.5.3 Environment Variables The following environment variables shall affect the execution of touch: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. TZ If the _t_i_m_e option-argument (or operand; see above) is specified, TZ shall be used to interpret the time for the specified time zone. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.63 touch - Change file access and modification times 759 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.63.5.4 Asynchronous Events Default. 4.63.6 External Effects 4.63.6.1 Standard Output None. 4.63.6.2 Standard Error Used only for diagnostic messages. 4.63.6.3 Output Files None. 4.63.7 Extended Description None. 4.63.8 Exit Status The touch utility shall exit with one of the following values: 0 The utility executed successfully and all requested changes were made. >0 An error occurred. 4.63.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 760 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.63.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The functionality of touch is described almost entirely through references to functions in POSIX.1 {8}. In this way, there is no duplication of effort required for describing such side effects as the relationship of user IDs to the user database, permissions, etc. The interpretation of time is taken to be ``seconds since the Epoch,'' as defined by 2.2.2.129. It should be noted that POSIX.1 {8} conforming implementations do not take leap seconds into account when computing seconds since the Epoch. When _S_S=60 is used on POSIX.1 {8} conforming implementations, the resulting time always refers to 1 plus ``seconds since the Epoch'' for a time when _S_S=59. Note that although the -t _t_i_m_e option-argument and the obsolescent _d_a_t_e__t_i_m_e operand specify values in 1969, the access time and modification time fields are defined in terms of seconds since the Epoch (midnight on 1 January 1970 UTC). Therefore, depending on the value of 1 TZ when touch is run, there will never be more than a few valid hours in 1969 and there need not be any valid times in 1969. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e There are some significant differences between this touch and those in System V and BSD systems. They are upward compatible for existing applications from both implementations. (1) In System V, an ambiguity exists when a pathname that is a decimal number leads the operands; it is treated as a time value. In BSD, no _t_i_m_e value is allowed; files may only be touched to the current time. The [-t _t_i_m_e] construct solves these problems for future portable applications (note that the -t option is not existing practice). (2) The inclusion of the century digits, _C_C, is also new. Note that a ten-digit _t_i_m_e value is treated as if _Y_Y, and not _C_C, were specified. The caveat about the range of dates following the Epoch was included as recognition that some UNIX systems will not be able to represent dates beyond the January 18, 2038, because they use _s_i_g_n_e_d _i_n_t as a time holder. One ambiguous situation occurs if -t _t_i_m_e is not specified, -r _r_e_f__f_i_l_e is not specified, and the first operand is an eight- or ten-digit decimal number. A portable script can avoid this problem by using: touch -- file Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.63 touch - Change file access and modification times 761 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX or touch ./file in this case. The -r option was added because several comments requested this capability. This option was named -f in an earlier draft, but was changed because the -f option is used in the BSD version of touch with a different meaning. At least one historical implementation of touch incremented the exit code if -c was specified and the file did not exist. This standard requires exit status zero if no errors occur. END_RATIONALE 4.64 tr - Translate characters 4.64.1 Synopsis tr [-cs] _s_t_r_i_n_g_1 _s_t_r_i_n_g_2 tr -s [-c] _s_t_r_i_n_g_1 tr -d [-c] _s_t_r_i_n_g_1 tr -ds [-c] _s_t_r_i_n_g_1 _s_t_r_i_n_g_2 4.64.2 Description The tr utility shall copy the standard input to the standard output with substitution or deletion of selected characters. The options specified and the _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 operands shall control translations that occur while copying characters and collating elements. 4.64.3 Options The tr utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 762 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -c Complement the set of characters specified by _s_t_r_i_n_g_1. See 4.64.7. -d Delete all occurrences of input characters that are specified by _s_t_r_i_n_g_1. -s Replace instances of repeated characters with a single 1 character, as described in 4.64.7. 1 4.64.4 Operands The following operands shall be supported by the implementation: _s_t_r_i_n_g_1 _s_t_r_i_n_g_2 Translation control strings. Each string shall represent a set of characters to be converted into an array of characters used for the translation. For a detailed description of how the strings are interpreted, see 4.64.7. 4.64.5 External Influences 4.64.5.1 Standard Input The standard input can be any type of file. 4.64.5.2 Input Files None. 4.64.5.3 Environment Variables The following environment variables shall affect the execution of tr: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.64 tr - Translate characters 763 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_COLLATE This variable shall determine the behavior of range expressions and equivalence classes. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments) and the behavior of character classes. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.64.5.4 Asynchronous Events Default. 4.64.6 External Effects 4.64.6.1 Standard Output The tr output shall be identical to the input, with the exception of the specified transformations. 4.64.6.2 Standard Error Used only for diagnostic messages. 4.64.6.3 Output Files None. 4.64.7 Extended Description The operands _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 (if specified) define two arrays of characters or collating elements. The following conventions can be used to specify characters or collating elements: _c_h_a_r_a_c_t_e_r Any character not described by one of the conventions below shall represent itself. \_o_c_t_a_l Octal sequences can be used to represent characters with specific coded values. An octal sequence shall consist of a backslash followed by the longest sequence of one-, two-, or three-octal-digit characters (01234567). The sequence shall cause the character whose encoding is Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 764 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 represented by the one-, two-, or three-digit octal integer to be placed into the array. If the size of a 1 byte on the system is greater than nine bits, the valid 1 escape sequence used to represent a byte is 1 implementation-defined. Multibyte characters require 1 multiple, concatenated escape sequences of this type, 1 including the leading \ for each byte. 1 \_c_h_a_r_a_c_t_e_r The backslash-escape sequences in Table 2-15 (see 2.12) shall be supported. The results of using any other character, other than an octal digit, following the backslash are unspecified. _c-_c Represents the range of collating elements between the 2 range endpoints, inclusive, as defined by the current setting of the LC_COLLATE locale category. The starting endpoint shall precede the second endpoint in the current collation order. The characters or collating elements in the range shall be placed in the array in ascending collation sequence. No multicharacter collating elements shall be included in the range. [:_c_l_a_s_s:] Represents all characters belonging to the defined character class, as defined by the current setting of the LC_CTYPE locale category. The following character class names shall be accepted when specified in _s_t_r_i_n_g_1: alnum cntrl lower space alpha digit print upper blank graph punct xdigit When the -d and -s options are specified together, any of the character class names shall be accepted in _s_t_r_i_n_g_2. Otherwise, only character class names lower or upper shall be accepted in _s_t_r_i_n_g_2 and then only if the corresponding character class (upper and lower, respectively) is specified in the same relative position in _s_t_r_i_n_g_1. Such a specification shall be interpreted as a request for case conversion. When [:lower:] appears in _s_t_r_i_n_g_1 and [:upper:] appears in _s_t_r_i_n_g_2, the arrays shall contain the characters from the toupper mapping in the LC_CTYPE category of the current locale. When [:upper:] appears in _s_t_r_i_n_g_1 and [:lower:] appears in _s_t_r_i_n_g_2, the arrays shall contain the characters from the tolower mapping in the LC_CTYPE category of the current locale. The first character from each mapping pair shall be in the array for _s_t_r_i_n_g_1 and the second character from each mapping pair shall be in the array for _s_t_r_i_n_g_2 in the same relative position. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.64 tr - Translate characters 765 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Except for case conversion, the characters specified by a character class expression shall be placed in the array in an unspecified order. If the name specified for _c_l_a_s_s does not define a valid character class in the current locale, the behavior is undefined. [=_e_q_u_i_v=] Represents all characters or collating elements belonging to the same equivalence class as _e_q_u_i_v, as defined by the current setting of the LC_COLLATE locale category. An equivalence class expression shall be allowed only in _s_t_r_i_n_g_1, or in _s_t_r_i_n_g_2 when it is being used by the combined -d and -s options. The characters belonging to the equivalence class shall be placed in the array in an unspecified order. [_x*_n] Represents _n repeated occurrences of the character or collating symbol _x. Because this expression is used to map multiple characters to one, it is only valid when it occurs in _s_t_r_i_n_g_2. If _n is omitted or is zero, it shall be interpreted as large enough to extend the _s_t_r_i_n_g_2- based sequence to the length of the _s_t_r_i_n_g_1-based sequence. If _n has a leading zero, it shall be interpreted as an octal value. Otherwise, it shall be interpreted as a decimal value. When the -d option is not specified: - Each input character or collating element found in the array specified by _s_t_r_i_n_g_1 shall be replaced by the character or collating element in the same relative position in the array specified by _s_t_r_i_n_g_2. When the array specified by _s_t_r_i_n_g_2 is shorter that the one specified by _s_t_r_i_n_g_1, the results are unspecified. - If the -c option is specified without -d, the complement of the characters specified by _s_t_r_i_n_g_1--the set of all characters in the current character set, as defined by the current setting of LC_CTYPE, except for those actually specified in the _s_t_r_i_n_g_1 operand--shall be placed in the array in ascending collation sequence, as defined by the current setting of LC_COLLATE. - Because the order in which characters specified by character class expressions or equivalence class expressions is undefined, such expressions should only be used if the intent is to map several characters into one. An exception is case conversion, as described previously. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 766 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 When the -d option is specified: - Input characters or collating elements found in the array specified by _s_t_r_i_n_g_1 shall be deleted. - When the -c option is specified with -d, all characters except those specified by _s_t_r_i_n_g_1 shall be deleted. The contents of _s_t_r_i_n_g_2 shall be ignored, unless the -s option is also specified. - The same string cannot be used for both the -d and the -s option; when both options are specified, both _s_t_r_i_n_g_1 (used for deletion) and _s_t_r_i_n_g_2 (used for squeezing) shall be required. When the -s option is specified, after any deletions or translations have taken place, repeated sequences of the same character shall be replaced by one occurrence of the same character, if the character is found in the array specified by the last operand. If the last operand contains a character class, such as the following example: tr -s '[:space:]' the last operand's array shall contain all of the characters in that character class. However, in a case conversion, as described previously, such as tr -s '[:upper:]' '[:lower:]' the last operand's array shall contain only those characters defined as the second characters in each of the toupper or tolower character pairs, as appropriate. 4.64.8 Exit Status The tr utility shall exit with one of the following values: 0 All input was processed successfully. >0 An error occurred. 4.64.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.64 tr - Translate characters 767 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.64.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e If necessary, _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 can be quoted to avoid pattern matching by the shell. The following example creates a list of all words in _f_i_l_e_1 one per line in _f_i_l_e_2, where a word is taken to be a maximal string of letters. tr -cs "[:alpha:]" "[\n*]" file2 If an ordinary digit (representing itself) is to follow an octal sequence, the octal sequence must use the full three digits to avoid ambiguity. When _s_t_r_i_n_g_2 is shorter than _s_t_r_i_n_g_1, a difference results between historical System V and BSD systems. A BSD system will pad _s_t_r_i_n_g_2 with the last character found in _s_t_r_i_n_g_2. Thus, it is possible to do the following: tr 0123456789 d which would translate all digits to the letter d. Since this area is specifically unspecified in the standard, both the BSD and System V behaviors are allowed, but a conforming application cannot rely on the BSD behavior. It would have to code the example in the following way: tr 0123456789 '[d*]' It should be noted that, despite similarities in appearance, the string operands used by tr are not regular expressions. On historical System V systems, a range expression requires enclosing 2 square-brackets, such as: 2 tr '[a-z]' '[A-Z]' 2 However, BSD-based systems did not require the brackets and this 2 convention is used by POSIX.2 to avoid breaking large numbers of BSD 2 scripts: 2 tr a-z A-Z 2 The preceding System V script will continue to work because the brackets, 2 treated as regular characters, are translated to themselves. However, 2 any System V script that relied on a-z representing the three characters 2 a, -, and z will have to be rewritten as az- or a\-z. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 768 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e In some earlier drafts, an explicit option, -n, was added to disable the historical behavior of stripping NUL characters from the input. It was felt that automatically stripping NUL characters from the input was not correct functionality. However, the removal of -n in a later draft does not remove the requirement that tr correctly process NUL characters in its input stream. NUL characters can be stripped by using tr -d '\000'. Historical implementations of tr differ widely in syntax and behavior. For example, the BSD version has not needed the bracket characters for the repetition sequence. The POSIX.2 tr syntax is based more closely on the System V and XPG3 model, while attempting to accommodate historical BSD implementations. In the case of the short _s_t_r_i_n_g_2 padding, the decision was to unspecify the behavior and preserve System V and XPG scripts, which might find difficulty with the BSD method. The assumption was made that BSD users of tr will have to make accommodations to meet the POSIX.2 syntax anyway, and since it is possible to use the repetition sequence to duplicate the desired behavior, whereas there is no simple way to achieve the System V method, this was the correct, if not desirable, approach. The use of octal values to specify control characters, while having historical precedents, is not portable. The introduction of escape sequences for control characters should provide the necessary portability. It is recognized that this may cause some historical scripts to break. A previous draft included support for multicharacter collating elements. Several balloters pointed out that, while tr does employ some syntactical elements from regular expressions, the aim of tr is quite different; ranges, for instance, do not mean the same thing (``any of the chars in the range matches,'' versus ``translate each character in the range to the output counterpart''). As a result, the previously included support for multicharacter collating elements has been removed. What remains are ranges in current collation order (to support, e.g., accented characters), character classes, and equivalence classes. In XPG3, the [:class:] and [=equiv=] conventions are shown with double brackets, as in regular expression syntax. Several balloters objected to this, pointing out that tr does not implement regular expression principles, just borrows part of the syntax. Consequently, the [:class:] and [=equiv=] should be regarded as syntactical elements on a par with [x*n], which is not an RE bracket expression. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.64 tr - Translate characters 769 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.65 true - Return true value 4.65.1 Synopsis true 4.65.2 Description The true utility shall return with exit code zero. 4.65.3 Options None. 4.65.4 Operands None. 4.65.5 External Influences 4.65.5.1 Standard Input None. 4.65.5.2 Input Files None. 4.65.5.3 Environment Variables None. 4.65.5.4 Asynchronous Events Default. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 770 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.65.6 External Effects 4.65.6.1 Standard Output None. 4.65.6.2 Standard Error None. 4.65.6.3 Output Files None. 4.65.7 Extended Description None. 4.65.8 Exit Status The true utility always exits with a value of zero. 4.65.9 Consequences of Errors Default. BEGIN_RATIONALE 4.65.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The true utility is typically used in shell scripts. The special built- in utility : (see 3.14.2) is sometimes more efficient than true. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The true utility has been retained in POSIX.2, even though the shell special built-in : provides similar functionality, because true is widely used in existing scripts and is less cryptic to novice human script readers. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.65 true - Return true value 771 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX END_RATIONALE 4.66 tty - Return user's terminal name 4.66.1 Synopsis tty _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: tty -s 4.66.2 Description The tty utility shall write to the standard output the name of the terminal that is open as standard input. The name that is used shall be equivalent to the string that would be returned by the POSIX.1 {8} _t_t_y_n_a_m_e() function. 4.66.3 Options The tty utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -s (Obsolescent.) Do not write the terminal name. Only the exit status shall be affected by this option. The terminal status shall be determined as if the POSIX.1 {8} _i_s_a_t_t_y() function were used. 4.66.4 Operands None. 4.66.5 External Influences Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 772 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.66.5.1 Standard Input While no input is read from standard input, standard input shall be examined to determine whether or not it is a terminal, and/or to determine the name of the terminal. 4.66.5.2 Input Files None. 4.66.5.3 Environment Variables The following environment variables shall affect the execution of tty: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE For the obsolescent version, this variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.66.5.4 Asynchronous Events Default. 4.66.6 External Effects 4.66.6.1 Standard Output If the -s option is specified, standard output shall not be used. If the -s option is not specified and standard input is a terminal device, a pathname of the terminal as specified by POSIX.1 {8} _t_t_y_n_a_m_e() shall be written in the following format: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.66 tty - Return user's terminal name 773 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX "%s\n", <_t_e_r_m_i_n_a_l _n_a_m_e> Otherwise, a message shall be written indicating that standard input is not connected to a terminal. In the POSIX Locale, the tty utility shall use the format: "not a tty\n" 4.66.6.2 Standard Error Used only for diagnostic messages. 4.66.6.3 Output Files None. 4.66.7 Extended Description None. 4.66.8 Exit Status The tty utility shall exit with one of the following values: 0 Standard input is a terminal. 1 Standard input is not a terminal. >1 An error occurred. 4.66.9 Consequences of Errors Default. BEGIN_RATIONALE 4.66.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This utility checks the status of the file open as standard input against that of a system-defined set of files. It is possible that no match can be found, or that the match found need not be the same file as that which was opened for standard input (although they are the same device). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 774 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 The -s option is useful only if the exit code is wanted. It does not rely on the ability to form a valid pathname. The -s option was made obsolescent because the same functionality is provided by test -t 0, but not dropped completely because historical scripts depend on this form. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The definition of tty was made more explicit to explain the difference between a tty and a pathname of a tty. END_RATIONALE 4.67 umask - Get or set the file mode creation mask 4.67.1 Synopsis umask [-S] [_m_a_s_k] 4.67.2 Description The umask utility shall set the file mode creation mask of the current shell execution environment (see 3.12) to the value specified by the _m_a_s_k operand. This mask shall affect the initial value of the file permission bits of subsequently created files. If the _m_a_s_k operand is not specified, the umask utility shall write to standard output the value of the invoking process's file mode creation mask. 4.67.3 Options The umask utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following option shall be supported by the implementation: -S Produce symbolic output. The default output style is unspecified, but shall be recognized on a subsequent invocation of umask on the same system as a _m_a_s_k operand to restore the previous file mode creation mask. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.67 umask - Get or set the file mode creation mask 775 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.67.4 Operands The following operand shall be supported by the implementation: _m_a_s_k A string specifying the new file mode creation mask. The string is treated in the same way as the _m_o_d_e operand described in 4.7.7 (chmod Extended Description). For a _s_y_m_b_o_l_i_c__m_o_d_e value, the new value of the file mode creation mask shall be the logical complement of the file permission bits portion of the file mode specified by the _s_y_m_b_o_l_i_c__m_o_d_e string. In a _s_y_m_b_o_l_i_c__m_o_d_e value, the permissions _o_p characters + and - shall be interpreted relative to the current file mode creation mask; + shall cause the bits for the indicated permissions to be cleared in the mask; - shall cause the bits for the indicated permissions to be set in the mask. The interpretation of _m_o_d_e values that specify file mode bits other than the file permission bits is unspecified. In the obsolescent octal integer form of _m_o_d_e, the specified bits shall be set in the file mode creation mask. The file mode creation mask shall be set to the resulting numeric value. As in chmod, application use of the octal number form for the _m_o_d_e values is obsolescent. The default output of a prior invocation of umask on the same system with no operand shall also be recognized as a _m_a_s_k operand. The use of an operand obtained in this way is not obsolescent, even if it is an octal number. 4.67.5 External Influences 4.67.5.1 Standard Input None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 776 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.67.5.2 Input Files None. 4.67.5.3 Environment Variables The following environment variables shall affect the execution of umask: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.67.5.4 Asynchronous Events Default. 4.67.6 External Effects 4.67.6.1 Standard Output When the _m_a_s_k operand is not specified, the umask utility shall write a message to standard output that can later be used as a umask _m_a_s_k operand. If -S is specified, the message shall be in the following format: "u=%s,g=%s,o=%s\n", <_o_w_n_e_r _p_e_r_m_i_s_s_i_o_n_s>, <_g_r_o_u_p _p_e_r_m_i_s_s_i_o_n_s>, <_o_t_h_e_r _p_e_r_m_i_s_s_i_o_n_s> where the three values shall be combinations of letters from the set {r, w, x}; the presence of a letter shall indicate that the corresponding bit is clear in the file mode creation mask. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.67 umask - Get or set the file mode creation mask 777 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX If a _m_a_s_k operand is specified, there shall be no output written to standard output. 4.67.6.2 Standard Error Used only for diagnostic messages. 4.67.6.3 Output Files None. 4.67.7 Extended Description None. 4.67.8 Exit Status The umask utility shall exit with one of the following values: 0 The file mode creation mask was successfully changed, or no _m_a_s_k operand was supplied. >0 An error occurred. 4.67.9 Consequences of Errors Default. BEGIN_RATIONALE 4.67.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Since umask affects the current shell execution environment, it is generally provided as a shell regular built-in. If it is called in a 1 subshell or separate utility execution environment, such as one of the 1 following: 1 (umask 002) 1 nohup umask ... 1 find . -exec umask ... \; 1 it will not affect the file mode creation mask of the caller's 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 778 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 environment. 1 The table mapping octal mode values in 4.7.7 does not require that the symbolic constants have those particular values. In contrast to the negative permission logic provided by the file mode creation mask and the octal number form of the _m_a_s_k argument, the symbolic form of the _m_a_s_k argument specifies those permissions that are left alone. Either of the commands: umask a=rx,ug+w umask 002 sets the mode mask so that subsequently created files have their S_IWOTH bit cleared. After setting the mode mask with either of the above commands, the umask command can be used to write out the current value of the mode mask: $ umask 0002 (The output format is unspecified, but historical implementations use the obsolescent octal integer mode format.) $ umask -S u=rwx,g=rwx,o=rx Either of these outputs can be used as the mask operand to a subsequent invocation of the umask utility. Assuming the mode mask is set as above, the command: umask g-w sets the mode mask so that subsequently created files have their S_IWGRP, and S_IWOTH bits cleared. The command: umask -- -w sets the mode mask so that subsequently created files have all their write bits cleared. Note that _m_a_s_k operands -r, -w, -x, or anything beginning with a hyphen, must be preceded by -- to keep it from being interpreted as an option. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.67 umask - Get or set the file mode creation mask 779 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The description of the historical utility was modified to allow it to use the symbolic modes of chmod. The -s option used in earlier drafts was changed to -S because -s could be confused with a _s_y_m_b_o_l_i_c__m_o_d_e form of mask referring to the S_ISUID and S_ISGID bits. The default output style is implementation defined to permit implementors to provide migration to the new symbolic style at the time most appropriate to their users. Earlier drafts of this standard specified an -o flag to force octal mode output. This was dropped because the octal mode may not be sufficient to specify all of the information that may be present in the file mode creation mask when more secure file access permission checks are implemented. It has been suggested that trusted systems developers might appreciate softening the requirement that the mode mask ``affects'' the file access permissions, since it seems access control lists might replace the mode mask to some degree. The wording has been changed to say that it affects the file permission bits, and leaves the details of the behavior of how they affect the file access permissions to the description in POSIX.1 {8}. END_RATIONALE 4.68 uname - Return system name 4.68.1 Synopsis uname [-amnrsv] 4.68.2 Description By default, the uname utility shall write the operating system name to standard output. When options are specified, symbols representing one or more system characteristics shall be written to the standard output. The format and contents of the symbols are implementation defined. On systems conforming to POSIX.1 {8}, the symbols written shall be those supported by the POSIX.1 {8} _u_n_a_m_e() function. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 780 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.68.3 Options The uname utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -a Behave as though all of the options -mnrsv were specified. -m Write the name of the hardware type on which the system is running to standard output. -n Write the name of this node within an implementation- specified communications network. -r Write the current release level of the operating system implementation. -s Write the name of the implementation of the operating system. -v Write the current version level of this release of the operating system implementation. If no options are specified, the uname utility shall write the operating system name, as if the -s option had been specified. 4.68.4 Operands None. 4.68.5 External Influences 4.68.5.1 Standard Input None. 4.68.5.2 Input Files None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.68 uname - Return system name 781 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.68.5.3 Environment Variables The following environment variables shall affect the execution of uname: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.68.5.4 Asynchronous Events Default. 4.68.6 External Effects 4.68.6.1 Standard Output By default, the output shall be a single line of the following form: "%s\n", <_s_y_s_n_a_m_e> If the -a option is specified, the output shall be a single line of the following form: "%s %s %s %s %s\n", <_s_y_s_n_a_m_e>, <_n_o_d_e_n_a_m_e>, <_r_e_l_e_a_s_e>, <_v_e_r_s_i_o_n>, <_m_a_c_h_i_n_e> Additional implementation-defined symbols may be written; all such symbols shall be written at the end of the line of output before the . If options are specified to select different combinations of the symbols, only those symbols shall be written, in the order shown above for the -a option. If a symbol is not selected for writing, its corresponding Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 782 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 trailing s also shall not be written. 4.68.6.2 Standard Error Used only for diagnostic messages. 4.68.6.3 Output Files None. 4.68.7 Extended Description None. 4.68.8 Exit Status The uname utility shall exit with one of the following values: 0 The requested information was successfully written. >0 An error occurred. 4.68.9 Consequences of Errors Default. BEGIN_RATIONALE 4.68.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following command: uname -sr writes the operating system name and release level, separated by one or more s. Note that any of the symbols could include embedded s, which may affect parsing algorithms if multiple options are selected for output. The node name is typically a name that the system uses to identify itself for intersystem communication addressing. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.68 uname - Return system name 783 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e It was suggested that this utility cannot be used portably, since the format of the symbols is implementation defined. The POSIX.1 {8} working group could not achieve consensus on defining these formats in the underlying _u_n_a_m_e() function and there is no expectation that POSIX.2 would be any more successful. In any event, some applications may still find this historical utility of value. For example, the symbols could be used for system log entries or for comparison with operator or user input. END_RATIONALE 4.69 uniq - Report or filter out repeated lines in a file 4.69.1 Synopsis uniq [-c|-d|-u] [-f _f_i_e_l_d_s] [-s _c_h_a_r_s] [_i_n_p_u_t__f_i_l_e [_o_u_t_p_u_t__f_i_l_e]] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: uniq [-c|-d|-u] [-_n] [+_m] [_i_n_p_u_t__f_i_l_e [_o_u_t_p_u_t__f_i_l_e]] 4.69.2 Description The uniq utility shall read an input file comparing adjacent lines, and write one copy of each input line on the output. The second and succeeding copies of repeated adjacent input lines shall not be written. Repeated lines in the input shall not be detected if they are not adjacent. 4.69.3 Options The uniq utility shall conform to the utility argument syntax guidelines described in 2.10.2; the obsolescent version does not, as one of the options begins with + and the -_m and +_n options do not have option letters. The following options shall be supported by the implementation: -c Precede each output line with a count of the number of times the line occurred in the input. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 784 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 -d Suppress the writing of lines that are not repeated in the input. -f _f_i_e_l_d_s Ignore the first _f_i_e_l_d_s fields on each input line when doing comparisons, where _f_i_e_l_d_s shall be a positive decimal integer. A field is the maximal string matched by the basic regular expresssion: [[:blank:]]*[^[:blank:]]* If the _f_i_e_l_d_s option-argument specifies more fields than appear on an input line, a null string shall be used for comparison. -s _c_h_a_r_s Ignore the first _c_h_a_r_s characters when doing comparisons, where _c_h_a_r_s shall be a positive decimal integer. If specified in conjunction with the -f option, the first _c_h_a_r_s characters after the first _f_i_e_l_d_s fields shall be ignored. If the _c_h_a_r_s option-argument specifies more characters than remain on an input line, a null string shall be used for comparison. -u Suppress the writing of lines that are repeated in the input. -_n (Obsolescent.) Equivalent to -f _f_i_e_l_d_s with _f_i_e_l_d_s set to _n. +_m (Obsolescent.) Equivalent to -s _c_h_a_r_s with _c_h_a_r_s set to _m. 4.69.4 Operands The following operands shall be supported by the implementation: _i_n_p_u_t__f_i_l_e A pathname of the input file. If the _i_n_p_u_t__f_i_l_e operand is not specified, or if the _i_n_p_u_t__f_i_l_e is -, the standard input shall be used. _o_u_t_p_u_t__f_i_l_e A pathname of the output file. If the _o_u_t_p_u_t__f_i_l_e operand is not specified, the standard output shall be used. The results are unspecified if the file named by _o_u_t_p_u_t__f_i_l_e is the file named by _i_n_p_u_t__f_i_l_e. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.69 uniq - Report or filter out repeated lines in a file 785 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.69.5 External Influences 4.69.5.1 Standard Input The standard input shall be used only if no _i_n_p_u_t__f_i_l_e operand is specified or if _i_n_p_u_t__f_i_l_e is -. See Input Files. 4.69.5.2 Input Files The input file shall be a text file. 4.69.5.3 Environment Variables The following environment variables shall affect the execution of uniq: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and which characters constitute a in the current locale. LC_MESSAGES This variable shall determine the language in which messages should be written. 4.69.5.4 Asynchronous Events Default. 4.69.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 786 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.69.6.1 Standard Output The standard output shall be used only if no _o_u_t_p_u_t__f_i_l_e operand is specified. See Output Files. 4.69.6.2 Standard Error Used only for diagnostic messages. 4.69.6.3 Output Files If the -c option is specified, the output file shall be empty or each line will be of the form: "%d %s", <_n_u_m_b_e_r _o_f _d_u_p_l_i_c_a_t_e_s>, <_l_i_n_e> otherwise, the output file will be empty or each line will be of the form: "%s", <_l_i_n_e> 4.69.7 Extended Description None. 4.69.8 Exit Status The uniq utility shall exit with one of the following values: 0 The utility executed successfully. >0 An error occurred. 4.69.9 Consequences of Errors Default. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.69 uniq - Report or filter out repeated lines in a file 787 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.69.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Some historical implementations have limited lines to be 1080 bytes in length, which will not meet the implied {LINE_MAX} limit. The sort utility (see 4.58) can be used to cause repeated lines to be adjacent in the input file. The following input file data (but flushed left) was used for a test series on uniq: #01 foo0 bar0 foo1 bar1 #02 bar0 foo1 bar1 foo1 #03 foo0 bar0 foo1 bar1 #04 #05 foo0 bar0 foo1 bar1 #06 foo0 bar0 foo1 bar1 #07 bar0 foo1 bar1 foo0 What follows is a series of test invocations of the uniq utility that use a mixture of uniq's options against the input file data. These tests verify the meaning of _a_d_j_a_c_e_n_t. The uniq utility views the input data as a sequence of strings delimited by \n. Accordingly, for the _f_i_e_l_d_sth member of the sequence, uniq interprets unique or repreated adjacent lines strictly relative to the _f_i_e_l_d_s+1th member. This first example tests the line counting option, comparing each line of the input file data starting from the second field: uniq -c -f 1 uniq_0I.t 1 #01 foo0 bar0 foo1 bar1 1 #02 bar0 foo1 bar1 foo0 1 #03 foo0 bar0 foo1 bar1 1 #04 2 #05 foo0 bar0 foo1 bar1 1 #07 bar0 foo1 bar1 foo0 The number 2, prefixing the fifth line of output, signifies that the uniq utility detected a pair of repeated lines. Given the input data, this can only be true when uniq is run using the -f 1 option (which causes uniq to ignore the first field on each input line). The second example tests the option to suppress unique lines, comparing each line of the input file data starting from the second field: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 788 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 uniq -d -f 1 uniq_0I.t #05 foo0 bar0 foo1 bar1 This test suppresses repeated lines, comparing each line of the input file data starting from the second field: uniq -u -f 1 uniq_0I.t #01 foo0 bar0 foo1 bar1 #02 bar0 foo1 bar1 foo1 #03 foo0 bar0 foo1 bar1 #04 #07 bar0 foo1 bar1 foo0 This suppresses unique lines, comparing each line of the input file data starting from the third character: uniq -d -s 2 uniq_0I.t In the last example, the uniq utility found no input matching the above criteria. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The -f and -s options were added to replace the obsolescent -_n and +_m options so that uniq could meet the syntax guidelines in an upward- compatible way. The output specifications in Output Files do not show a terminating because they both specify <_l_i_n_e>, which includes its own (because of the definition of _l_i_n_e). END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.69 uniq - Report or filter out repeated lines in a file 789 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.70 wait - Await process completion 4.70.1 Synopsis wait [_p_i_d ...] 4.70.2 Description When an asynchronous list (see 3.9.3.1) is started by the shell, the process ID of the last command in each element of the asynchronous list 1 shall become known in the current shell execution environment; see 3.12. If the wait utility is invoked with no operands, it shall wait until all process IDs known to the invoking shell have terminated and exit with a zero exit status. If one or more _p_i_d operands are specified that represent known process IDs, the wait utility shall wait until all of them have terminated. If one or more _p_i_d operands are specified that represent unknown process IDs, wait shall treat them as if they were known process IDs that exited with exit status 127. The exit status returned by the wait utility shall be the exit status of the process requested by the last _p_i_d operand. The known process IDs are applicable only for invocations of wait in the current shell execution environment. 4.70.3 Options None. 4.70.4 Operands The following operand shall be supported by the implementation: _p_i_d The unsigned decimal integer process ID of a command, for which the utility is to wait for the termination. 4.70.5 External Influences Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 790 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.70.5.1 Standard Input None. 4.70.5.2 Input Files None. 4.70.5.3 Environment Variables The following environment variables shall affect the execution of wait: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 4.70.5.4 Asynchronous Events Default. 4.70.6 External Effects 4.70.6.1 Standard Output None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.70 wait - Await process completion 791 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.70.6.2 Standard Error Used only for diagnostic messages. 4.70.6.3 Output Files None. 4.70.7 Extended Description None. 4.70.8 Exit Status If one or more operands were specified, all of them have terminated or were not known by the invoking shell, and the status of the last operand specified is known, then the exit status of wait shall be the exit status information of the command indicated by the last operand specified. If the process terminated abnormally due to the receipt of a signal, the exit status shall be greater than 128 and shall be distinct from the exit status generated by other signals, but the exact value is unspecified. (See the kill -l option in 4.32.) Otherwise, the wait utility shall exit with one of the following values: 0 The wait utility was invoked with no operands and all process IDs known by the invoking shell have terminated. 1-126 The wait utility detected an error. 127 The command identified by the last _p_i_d operand specified is unknown. 4.70.9 Consequences of Errors Default. BEGIN_RATIONALE 4.70.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e On most implementations, wait is a shell built-in. If it is called in a 1 subshell or separate utility execution environment, such as one of the 1 following: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 792 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 (wait) 1 nohup wait ... 1 find . -exec wait ... \; 1 it will return immediately because there will be no known process IDs to 1 wait for in those environments. 1 Although the exact value used when a process is terminated by a signal is unspecified, if it is known that a signal terminated a process, a script can still reliably figure out which signal using kill as shown by the following script: sleep 1000& pid=$! kill -kill $pid wait $pid echo $pid was terminated by a SIG$(kill -l $?) signal. Historical implementations of interactive shells have discarded the exit status of terminated background processes before each shell prompt. Therefore, the status of background processes was usually lost unless it terminated while wait was waiting for it. This could be a serious problem when a job that was expected to run for a long time actually terminated quickly with a syntax or initialization error because the exit status returned was usually zero if the requested process ID was not found. POSIX.2 requires the implementation to keep the status of terminated jobs available until the status is requested, so that scripts like: j1& p1=$! j2& wait $p1 echo Job 1 exited with status $? wait $! echo Job 2 exited with status $? will work without losing status on any of the jobs. The shell is allowed to discard the status of any process that it determines the application cannot get the process ID from the shell. It is also required to 1 remember only {CHILD_MAX} number of processes in this way. Since the 1 only way to get the process ID from the shell is by using the ! shell parameter, the shell is allowed to discard the status of an asynchronous list if $! was not referenced before another asynchronous list was started. (This means that the shell only has to keep the status of the last asynchronous list started if the application did not reference $!. If the implementation of the shell is smart enough to determine that a reference to $! was not ``saved'' anywhere that the application can retrieve it later, it can use this information to trim the list of saved Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.70 wait - Await process completion 793 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX information. Note also that a successful call to wait with no operands discards the exit status of all asynchronous lists.) This new functionality was added because it is needed to accurately determine the exit status of any asynchronous list. The only compatibility problem that this change creates is for a script like: while sleep 60 do job& echo Job started $(date) as $! done which will cause the shell to keep track of all of the jobs started until the script terminates or runs out of memory. This would not be a problem if the loop did not reference $! or if the script would occasionally wait for jobs it started. If the exit status of wait is greater than 128, there is no way for the application to know if the waited for process exited with that value or was killed by a signal. Since most utilities exit with small values, there is seldom any ambiguity. Even in the ambiguous cases, most applications just need to know that the asynchronous job failed; it does not matter whether it detected an error and failed or was killed and did not complete its job normally. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The description of wait does not refer to the _w_a_i_t_p_i_d() function from POSIX.1 {8}, because that would needlessly overspecify this interface. However, the wording requires that wait is required to wait for an explicit process when it is given an argument, so that the status information of other processes is not consumed. Historical implementations use POSIX.1 {8} _w_a_i_t() until _w_a_i_t() returns the requested process ID or finds that the requested process does not exist. Because this means that a shell script could not reliably get the status of all background children if a second background job was ever started before the first job finished, it is recommended that the wait utility use a method such as the functionality provided by the _w_a_i_t_p_i_d() function in POSIX.1 {8}. The ability to wait for multiple _p_i_d operands was adopted from the KornShell at the request of ballot comments and objections. Some implementations of wait support waiting for asynchronous lists identified by the use of job identifiers. For example, wait %1 would wait for the first background job. This standard does not address job control issues, but allows these features to be added as extensions. Job control facilities will be provided by the UPE. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 794 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 END_RATIONALE 4.71 wc - Word, line, and byte count 4.71.1 Synopsis wc [-clw] [_f_i_l_e ...] 4.71.2 Description The wc utility shall read one or more input files and, by default, write the number of s, words, and bytes contained in each input file to the standard output. The utility also shall write a total count for all named files, if more than one input file is specified. The wc utility shall consider a _w_o_r_d to be a nonzero-length string of characters delimited by white space. 4.71.3 Options The wc utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -c Write to the standard output the number of bytes in each input file. -l Write to the standard output the number of s in each input file. -w Write to the standard output the number of words in each input file. When any option is specified, wc shall report only the information requested by the specified option(s). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.71 wc - Word, line, and byte count 795 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.71.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If no _f_i_l_e operands are specified, the standard input shall be used. 4.71.5 External Influences 4.71.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. 4.71.5.2 Input Files The input files may be of any type. 4.71.5.3 Environment Variables The following environment variables shall affect the execution of wc: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and which characters are defined as ``white space'' characters. LC_MESSAGES This variable shall determine the language in which messages should be written. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 796 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.71.5.4 Asynchronous Events Default. 4.71.6 External Effects 4.71.6.1 Standard Output By default, the standard output shall contain a line for each input file of the form: "%d %d %d %s\n", <_n_e_w_l_i_n_e_s>, <_w_o_r_d_s>, <_b_y_t_e_s>, <_f_i_l_e> If any options are specified and the -l option is not specified, the number of s shall not be written. If any options are specified and the -w option is not specified, the number of words shall not be written. If any options are specified and the -c option is not specified, the number of bytes shall not be written. If no input _f_i_l_e operands are specified, no name shall be written and no s preceding the pathname shall be written. If more than one input _f_i_l_e operand is specified, an additional line shall be written, of the same format as the other lines, except that the word total (in the POSIX Locale) shall be written instead of a pathname and the total of each column shall be written as appropriate. Such an additional line, if any, shall be written at the end of the output. 4.71.6.2 Standard Error Used only for diagnostic messages. 4.71.6.3 Output Files None. 4.71.7 Extended Description None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.71 wc - Word, line, and byte count 797 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.71.8 Exit Status The wc utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 4.71.9 Consequences of Errors Default. BEGIN_RATIONALE 4.71.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e None. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The output file format pseudo-_p_r_i_n_t_f() string was derived from the HP-UX version of wc; the System V version: "%7d%7d%7d %s\n" produces possibly ambiguous and unparsable results for very large files, as it assumes no number will exceed six digits. Some historical implementations use only , , and as word separators. The equivalent of the C Standard {7} _i_s_s_p_a_c_e() function is more appropriate. The -c option stands for ``character'' count, even though it counts bytes. This stems from the sometimes erroneous historical view that bytes and characters are the same size. Earlier drafts only specified the results when input files were text files. The current specification more closely matches existing practice. (Bytes, words, and _s are counted separately and the results are written when an end-of-file is detected.) Historical implementations of the wc utility only accepted one argument to specify the options -c, -l, and -w. Some of them also had multiple occurrences of an option cause the corresponding count to be output multiple times and having the order of specification of the options Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 798 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 affect the order of the fields on output, but did not document either of these. Because common usage either specifies no options or only one option and because none of this was documented, the changes required by this standard should not break many existing applications (and does not break any historical portable applications.) END_RATIONALE 4.72 xargs - Construct argument list(s) and invoke utility 4.72.1 Synopsis xargs [-t] [-n _n_u_m_b_e_r [-x] ] [-s _s_i_z_e] [_u_t_i_l_i_t_y [_a_r_g_u_m_e_n_t ...]] 4.72.2 Description The xargs utility shall construct a command line consisting of the _u_t_i_l_i_t_y and _a_r_g_u_m_e_n_t operands specified followed by as many arguments read in sequence from standard input as will fit in length and number constraints specified by the options. The xargs utility shall then invoke the constructed command line and wait for its completion. This sequence shall be repeated until an end-of-file condition is detected on standard input or an invocation of a constructed command line returns an 1 exit status of 255. 1 Arguments in the standard input shall be separated by unquoted s, or unescaped s or s. A string of zero or more nondouble-quote (") and non- characters can be quoted by enclosing them in double-quotes. A string of zero or more nonapostrophe (') and non- characters can be quoted by enclosing them in apostrophes. Any unquoted character can be escaped by preceding it with a backslash. The _u_t_i_l_i_t_y shall be executed one or more times until the end-of-file is reached. The results are unspecified if the utility named by _u_t_i_l_i_t_y attempts to read from its standard input. The generated command line length shall be the sum of the size in bytes of the utility name and each argument treated as strings, including a null byte terminator for each of these strings. The xargs utility shall limit the command line length such that when the command line is invoked, the combined argument and environment lists (see the _e_x_e_c family of functions in POSIX.1 {8} 3.1.2) shall not exceed {ARG_MAX}-2048 bytes. Within this constraint, if neither the -n nor the -s option is specified, the default command line length shall be at least {LINE_MAX}. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.72 xargs - Construct argument list(s) and invoke utility 799 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 4.72.3 Options The xargs utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -n _n_u_m_b_e_r Invoke _u_t_i_l_i_t_y using as many standard input arguments as possible, up to _n_u_m_b_e_r (a positive decimal integer) arguments maximum. Fewer arguments shall be used if: - The command line length accumulated exceeds the size specified by the -s option (or {LINE_MAX} if there is no -s option), or - The last iteration has fewer than _n_u_m_b_e_r, but not zero, operands remaining. -s _s_i_z_e Invoke _u_t_i_l_i_t_y using as many standard input arguments as possible yielding a command line length less than _s_i_z_e (a positive decimal integer) bytes. Fewer arguments shall be used if: - The total number of arguments exceeds that specified by the -n option, or - End of file is encountered on standard input before _s_i_z_e bytes are accumulated. Implementations shall support values of _s_i_z_e up to at least {LINE_MAX} bytes, provided that the constraints specified in 4.72.2 are met. It shall not be considered an error if a value larger than that supported by the implementation or exceeding the constraints specified in 4.72.2 is given; xargs shall use the largest value it supports within the constraints. -t Enable trace mode. Each generated command line shall be written to standard error just prior to invocation. -x Terminate if a command line containing _n_u_m_b_e_r arguments (see the -n option above) will not fit in the implied or specified size (see the -s option above). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 800 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 4.72.4 Operands The following operands shall be supported by the implementation: _u_t_i_l_i_t_y The name of the utility to be invoked, found by search path using the PATH environment variable, described in 2.6. If _u_t_i_l_i_t_y is omitted, the default shall be the echo utility (see 4.19). If the _u_t_i_l_i_t_y operand names any of the special built-in utilities in 3.14, the results are undefined. _a_r_g_u_m_e_n_t An initial option or operand for the invocation of _u_t_i_l_i_t_y. 4.72.5 External Influences 4.72.5.1 Standard Input The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped . 4.72.5.2 Input Files None. 4.72.5.3 Environment Variables The following environment variables shall affect the execution of xargs: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.72 xargs - Construct argument list(s) and invoke utility 801 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_MESSAGES This variable shall determine the language in which messages should be written. 4.72.5.4 Asynchronous Events Default. 4.72.6 External Effects Any external effects are a result of the invocation of the utility _u_t_i_l_i_t_y, in a manner specified by that utility. 4.72.6.1 Standard Output None. 4.72.6.2 Standard Error Used for diagnostic messages and the -t option. If the -t option is specified, the _u_t_i_l_i_t_y and its constructed argument list shall be written to standard error, as it will be invoked, prior to invocation. 4.72.6.3 Output Files None. 4.72.7 Extended Description None. 4.72.8 Exit Status The xargs utility shall exit with one of the following values: 0 All invocations of _u_t_i_l_i_t_y returned exit status zero. 1-125 A command line meeting the specified requirements could not 1 be assembled, one or more of the invocations of _u_t_i_l_i_t_y 1 returned a nonzero exit status, or some other error occurred. 1 126 The utility specified by _u_t_i_l_i_t_y was found but could not be 1 invoked. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 802 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 127 The utility specified by _u_t_i_l_i_t_y could not be found. 1 4.72.9 Consequences of Errors If a command line meeting the specified requirements cannot be assembled, the utility cannot be invoked, an invocation of the utility is terminated by a signal, or an invocation of the utility exits with exit status 255, the xargs utility shall write a diagnostic message and exit without processing any remaining input. BEGIN_RATIONALE 4.72.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The xargs utility is usually found only in System V-based systems; BSD systems provide an apply utility that provides functionality similar to xargs -n _n_u_m_b_e_r. The _S_V_I_D lists xargs as a software development extension; POSIX.2 does not share the view that it is used only for development, and therefore it is not optional. Note that input is parsed as lines and _s separate arguments. If xargs is used to bundle output of commands like find dir -print or ls into commands to be executed, unexpected results are likely if any file names contain any _s or _s. This can be fixed by using find to call a script that converts each file found into a quoted string that is then piped to xargs. Note that the quoting rules used by xargs are not the same as in the shell. They were not made consistent here because existing applications depend on the current rules and the shell syntax is not fully compatible with it. An easy rule that can be used to transform any string into a quoted form that xargs will interpret correctly is to precede each character in the string with a backslash. The following command will combine the output of the parenthesized commands onto one line, which is then written to the end of file log: (logname; date; printf "%s\n" "$0 $*") | xargs >>log The following command will invoke diff with successive pairs of arguments originally typed as command line arguments (assuming there are no embedded _s in the elements of the original argument list): printf "%s\n" "$*" | xargs -n 2 -x diff On implementations with a large value for {ARG_MAX}, xargs may produce command lines longer than {LINE_MAX}. For invocation of utilities, this Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.72 xargs - Construct argument list(s) and invoke utility 803 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX is not a problem. If xargs is being used to create a text file, users should explicitly set the maximum command line length with the -s option. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The list of options has been scaled down extensively. As it had stood, the xargs utility did not exhibit an economy of powerful, modular, or extensible functionality. The classic application of the xargs utility is in conjunction with the find utility to reduce the number of processes launched by a simplistic use of the find -exec combination. The xargs utility is also used to enforce an upper limit on memory required to launch a process. With this basis in mind, POSIX.2 selected only the minimal features required. The -n _n_u_m_b_e_r option was classically used to evoke a utility using pairs of operands, yet the general case has problems when _u_t_i_l_i_t_y spawns child processes of its own. The xargs utility can sap resources from these children, especially those sharing the parent's environment. The command, env, nohup, and xargs utilities have been specified to use exit code 127 if an error occurs so that applications can distinguish 1 ``failure to find a utility'' from ``invoked utility exited with an error 1 indication.'' The value 127 was chosen because it is not commonly used 1 for other meanings; most utilities use small values for ``normal error conditions'' and the values above 128 can be confused with termination due to receipt of a signal. The value 126 was chosen in a similar manner 1 to indicate that the utility could be found, but not invoked. Some 1 scripts produce meaningful error messages differentiating the 126 and 127 1 cases. The distinction between exit codes 126 and 127 is based on 2 KornShell practice that uses 127 when all attempts to _e_x_e_c the utility 2 fail with [ENOENT], and uses 126 when any attempt to _e_x_e_c the utility 2 fails for any other reason. 2 Although the 255 exit status is mostly an accident of historical 1 implementations, it allows a utility being used by xargs to tell xargs to terminate if it knows no further invocations using the current data stream will succeed. Any nonzero exit status from a utility will fall 1 into the 1-125 range when xargs exits. There is no statement of how the 1 various nonzero utility exit status codes are accumulated by xargs. The 1 value could be the addition of all codes, their highest value, the last 1 one received, or a single value such as 1. Since no algorithm is 1 arguably better than the others, and since many of the POSIX.2 standard 1 utilities say little more (portably) than ``pass/fail,'' no new algorithm 1 was invented. 1 Several other xargs options were withdrawn because simple alternatives already exist within the standard. For example, the -e_e_o_f_s_t_r option has a sed work around. The -i_r_e_p_l_s_t_r option can be just as efficiently Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 804 4 Execution Environment Utilities Part 2: SHELL AND UTILITIES P1003.2/D11.2 performed using a shell for loop. Since xargs will _e_x_e_c() with each input line, the -i option will usually not exploit xarg'_s grouping capabilities. The -s option was reinstated since many of the balloters on Draft 8 felt that it was preferable to the -r option invented for that draft that required the implementation to use {ARG_MAX} - _s_i_z_e bytes for command lines. The requirement that xargs never produce command lines such that invocation of _u_t_i_l_i_t_y is within 2048 bytes of hitting the POSIX.1 {8} _e_x_e_c {ARG_MAX} limitations is intended to guarantee that the invoked utility has a little bit of room to modify its environment variables and command line arguments and still be able to invoke another utility. Note that the minimum {ARG_MAX} allowed by POSIX.1 {8} is 4096 and the minimum value allowed by POSIX.2 is 2048; therefore, the 2048-byte difference seems reasonable. Note, however, that xargs may never be able to invoke a utility if the environment passed in to xargs comes close to using {ARG_MAX} bytes. The version of xargs required by POSIX.2 is required to wait for the completion of the invoked command before invoking another command. This was done because existing scripts using xargs assumed sequential execution. Implementations wanting to provide parallel operation of the invoked utilities are encouraged to add an option enabling parallel invocation, but should still wait for termination of all of the children before xargs terminates normally. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 4.72 xargs - Construct argument list(s) and invoke utility 805 P1003.2/D11.2 Section 5: User Portability Utilities Option BEGIN_RATIONALE _E_d_i_t_o_r'_s _N_o_t_e: _T_h_i_s _e_m_p_t_y _s_e_c_t_i_o_n _i_s _p_l_a_c_e_h_o_l_d_e_r _f_o_r _a _f_u_t_u_r_e _r_e_v_i_s_i_o_n (_t_h_e _U_s_e_r _P_o_r_t_a_b_i_l_i_t_y _E_x_t_e_n_s_i_o_n, _P_1_0_0_3._2_a) _t_o _c_o_n_t_a_i_n _d_e_s_c_r_i_p_t_i_o_n_s _o_f _u_t_i_l_i_t_i_e_s _t_h_a_t _a_r_e _s_u_i_t_a_b_l_e _f_o_r _u_s_e_r _p_o_r_t_a_b_i_l_i_t_y _o_n _a_s_y_n_c_h_r_o_n_o_u_s _c_h_a_r_a_c_t_e_r _t_e_r_m_i_n_a_l_s. _P_1_0_0_3._2_a _i_s _c_u_r_r_e_n_t_l_y _b_a_l_l_o_t_i_n_g _w_i_t_h_i_n _t_h_e _I_E_E_E. _C_o_n_t_a_c_t _t_h_e _I_E_E_E _S_t_a_n_d_a_r_d_s _O_f_f_i_c_e _t_o _o_b_t_a_i_n _a _c_o_p_y _o_f _t_h_e _l_a_t_e_s_t _d_r_a_f_t. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 5 User Portability Utilities Option 807 P1003.2/D11.2 Section 6: Software Development Utilities Option This section describes utilities used for the development of applications, including compilation or translation of source code, the creation and maintenance of library archives, and the maintenance of groups of interdependent programs. The utilities described in this section may be provided by the conforming system; however, any system claiming conformance to the Software Development Utilities Option shall provide all of the utilities described here. 6.1 ar - Create and maintain library archives 6.1.1 Synopsis ar -d [-v] _a_r_c_h_i_v_e _f_i_l_e ... ar -p [-v] _a_r_c_h_i_v_e [_f_i_l_e ...] ar -r [-cuv] _a_r_c_h_i_v_e _f_i_l_e ... ar -t [-v] _a_r_c_h_i_v_e [_f_i_l_e ...] ar -x [-v] _a_r_c_h_i_v_e [_f_i_l_e ...] 6.1.2 Description The ar utility can be used to create and maintain groups of files combined into an archive. Once an archive has been created, new files can be added, and existing files can be extracted, deleted, or replaced. When an archive consists entirely of valid object files, the implementation shall format the archive so that it is usable as a library for link editing (see A.1 and C.2). When some of the archived files are not valid object files, the suitability of the archive for library use is undefined. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.1 ar - Create and maintain library archives 809 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX All _f_i_l_e operands can be pathnames. However, files within archives shall be named by a filename, which is the last component of the pathname used when the file was entered into the archive. The comparison of _f_i_l_e operands to the names of files in archives shall be performed by comparing the last component of the operand to the name of the archive file. It is unspecified whether multiple files in the archive may be identically named. In the case of such files, however, each _f_i_l_e operand shall match only the first archive file having a name that is the same as the last component of the _f_i_l_e operand. 6.1.3 Options The ar utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -c Suppress the diagnostic message that is written to standard error by default when the archive file _a_r_c_h_i_v_e is created. -d Delete _f_i_l_e(s) from _a_r_c_h_i_v_e. -p Write the contents of the _f_i_l_e(s) from _a_r_c_h_i_v_e to the standard output. If no _f_i_l_e(s) are specified, the contents of all files in the archive shall be written in the order of the archive. -r Replace or add _f_i_l_e(s) to _a_r_c_h_i_v_e. If the archive named by _a_r_c_h_i_v_e does not exist, a new archive file shall be created and a diagnostic message shall be written to standard error (unless the -c option is specified). If no _f_i_l_e(s) are specified and the _a_r_c_h_i_v_e exists, the results are undefined. Files that replace existing files shall not change the order of the archive. Files that do not replace existing files shall be appended to the archive. -t Write a table of contents of _a_r_c_h_i_v_e to the standard output. The files specified by the _f_i_l_e operands shall be included in the written list. If no _f_i_l_e operands are specified, all files in _a_r_c_h_i_v_e shall be included in the order of the archive. -u Update older files. When used with the -r option, files within the archive will be replaced only if the corresponding _f_i_l_e has a modification time that is at Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 810 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 least as new as the modification time of the file within the archive. -v Give verbose output. When used with the option characters -d, -r, or -x, write a detailed file-by-file description of the archive creation and maintenance activity, as described in 6.1.6.1. When used with -p, write the name of the file to the standard output before writing the file itself to the standard output, as described in 6.1.6.1. When used with -t, include a long listing of information about the files within the archive, as described in 6.1.6.1. -x Extract the files named by the _f_i_l_e operands from _a_r_c_h_i_v_e. The contents of the archive file shall not be changed. If no _f_i_l_e operands are given, all files in the archive shall be extracted. If the filename of a file extracted from the archive is longer than that supported in the directory to which it is being extracted, the results are undefined. The modification time of each file extracted shall be set to the time the file is extracted from the archive. 6.1.4 Operands The following operands shall be supported by the implementation: _a_r_c_h_i_v_e A pathname of the archive file. _f_i_l_e A pathname. Only the last component shall be used when comparing against the names of files in the archive. If two or more _f_i_l_e operands have the same last pathname component (basename), the results are unspecified. The implementation's archive format shall not truncate valid filenames of files added to, or replaced in, the archive. 6.1.5 External Influences 6.1.5.1 Standard Input None. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.1 ar - Create and maintain library archives 811 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 6.1.5.2 Input Files The input file named by _a_r_c_h_i_v_e shall be a file in the format created by ar -r. 6.1.5.3 Environment Variables The following environment variables shall affect the execution of ar: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. LC_TIME This variable shall determine the format and content for date and time strings written by ar. 6.1.5.4 Asynchronous Events Default. 6.1.6 External Effects 6.1.6.1 Standard Output If the -d option is used with the -v option, the standard output format is: "d - %s\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 812 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 If the -p option is used with the -v option, ar shall precede the contents of each file with: "\n<%s>\n\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line, if _f_i_l_e operands were specified, and the name of the file in the archive if they were not. If the -r option is used with the -v option, and _f_i_l_e is already in the archive, the standard output format is: "r - %s\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line. If _f_i_l_e is being added to the archive with the -r option, the standard output format is: "a - %s\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line. If the -t option is used, ar writes the names of the files to the standard output in the format: "%s\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line, if _f_i_l_e operands were specified, or the name of the file in the archive if they were not. If the -t option is used with the -v option, the standard output format is: "%s %u/%u %u %s %d %d:%d %d %s\n", <_m_e_m_b_e_r _m_o_d_e>, <_u_s_e_r _I_D>, <_g_r_o_u_p _I_D>, <_n_u_m_b_e_r _o_f _b_y_t_e_s _i_n _m_e_m_b_e_r>, <_a_b_b_r_e_v_i_a_t_e_d _m_o_n_t_h>, <_d_a_y-_o_f-_m_o_n_t_h>, <_h_o_u_r>, <_m_i_n_u_t_e>, <_y_e_a_r>, <_f_i_l_e> Where: _f_i_l_e shall be the operand specified on the command line, if _f_i_l_e operands were specified, or the name of the file in the archive if they were not. <_m_e_m_b_e_r _m_o_d_e> shall be formatted the same as the <_f_i_l_e _m_o_d_e> string defined in 4.39.6.1 (Standard Output of ls), except that the first character, the <_e_n_t_r_y _t_y_p_e>, is not used; the string represents the file mode of the archive member at the time it was added to, or replaced in, the archive. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.1 ar - Create and maintain library archives 813 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The following represent the last-modification time of a file when it was most recently added to or replaced in the archive: <_a_b_b_r_e_v_i_a_t_e_d _m_o_n_t_h> shall be equivalent to the %b format in date (see 4.15). <_d_a_y-_o_f-_m_o_n_t_h> shall be equivalent to the %e format in date. <_h_o_u_r> shall be equivalent to the %H format in date. <_m_i_n_u_t_e> shall be equivalent to the %M format in date. <_y_e_a_r> shall be equivalent to the %Y format in date. When LC_TIME does not specify the POSIX Locale, a different format and order of presentation of these fields relative to each other may be used in a format appropriate in the specified locale. If the -x option is used with the -v option, the standard output format is: "x - %s\n", <_f_i_l_e> where _f_i_l_e is the operand specified on the command line, if _f_i_l_e operands were specified, or the name of the file in the archive if they were not. 6.1.6.2 Standard Error Used only for diagnostic messages. The diagnostic message about creating a new archive when -c is not specified shall not modify the exit status. 6.1.6.3 Output Files Archives are files with unspecified formats. 6.1.7 Extended Description None. 6.1.8 Exit Status The ar utility shall exit with one of the following values: 0 Successful completion. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 814 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 >0 An error occurred. 6.1.9 Consequences of Errors Default. BEGIN_RATIONALE 6.1.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The archive format is not described. It is recognized that there are several known ar formats, which are not compatible. The ar utility is being included, however, to allow creation of archives that are intended for use only on the same machine. The archive file is specified as a file and it can be moved as a file. This does allow an archive to be moved from one machine to another machine that uses the same implementation of ar. Utilities such as pax (and its forebears tar and cpio) also provide 1 portable ``archives.'' This is a not a duplication; the ar interface is included in the standard to provide an interface primarily for make and the compilers, based on a historical model. In historical implementations, the -q option is known to execute quickly because ar does not check whether the added members are already in the archive. This is useful to bypass the searching otherwise done when creating a large archive piece-by-piece. The remarks may or may not hold true for a brand-new POSIX.2 implementation; and hence, these remarks have been moved out of the specification and into the Rationale. Likewise, historical implementations maintain a symbol table to speed searches, particularly when the archive contains object files. However, future implementors may or may not use a symbol table, and the -s option was removed from this clause to permit implementors freedom of choice. Instead, the requirement that archive libraries be suitable for link editing was added to ensure the intended functionality. Systems such as System V maintain the symbol table without requiring the use of -s, so adding -s (even if it were worded as allowing a no-op) would essentially require all portable applications to use it in all invocations involving libraries. The Operands subclause requires what might seem to be true without specifying it: the archive cannot truncate the filenames below {NAME_MAX}. Some historical implementations do so, however, causing unexpected results for the application. Therefore, POSIX.2 makes the Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.1 ar - Create and maintain library archives 815 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX requirement explicit to avoid misunderstandings. According to the System V documentation, the options -dmpqrtx are not required to begin with a hyphen ( - ). POSIX.2 requires that a conforming application use the leading hyphen. When extracting files with long filenames into a file system that supports only shorter filenames, an undefined condition occurs. Typical implementation actions might be one of the following: - Extract and truncate the filename only when an existing file would not be overlaid. - Extract and truncate the filename and overlay an existing file only if some extension such as another command-line option were used to override this safety feature. - Refuse to extract any files unless an extension overrode the default. The archive format used by the 4.4BSD implementation is documented in the rationale as an example: A file created by ar begins with the ``magic'' string ``!\n''. The rest of the archive is made up of objects, each of which is composed of a header for a file, a possible filename, and the file contents. The header is portable between machine architectures, and, if the file contents are printable, the archive is itself printable. The header is made up of six ASCII fields, followed by a two- 2 character trailer. The fields are the object name (16 characters), the file last modification time (12 characters), the user and group IDs (each 6 characters), the file mode (8 characters) and the file size (10 characters). All numeric fields are in decimal, except for the file mode, which is in octal. The modification time is the file _s_t__m_t_i_m_e field. The user and group IDs are the file _s_t__u_i_d and _s_t__g_i_d fields. The file mode is the file _s_t__m_o_d_e field. The file size is the file _s_t__s_i_z_e field. The two-byte trailer is the string ```''. Only the name field has any provision for overflow. If any filename is more than 16 characters in length or contains an embedded space, the string ``#1/'' followed by the ASCII length of the name is written in the name field. The file size (stored in the archive header) is incremented by the length of the name. The name is then written immediately following the archive header. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 816 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Any unused characters in any of these fields are written as characters. If any fields are their particular maximum number of characters in length, there will be no separation between the fields. Objects in the archive are always an even number of bytes long; files that are an odd number of bytes long are padded with a character, although the size in the header does not reflect this. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The ar utility description requires that (when all its members are valid object files) ar produce an object code library, which the linkage editor can use to extract object modules. If the linkage editor needs a symbol table to permit random access to the archive, ar must provide it; however, ar does not require a symbol table. The historical -m and -q positioning options were omitted, as were the positioning modifiers formerly associated with the -m and -r options, because the two functions of positioning are handled by the ranlib-style (a utility found on some 1 historical systems to create symbol tables within the archive) symbol 1 tables and/or the ability of portable applications to create multiple archives instead of loading from a single archive. Earlier drafts had elaborate descriptions in the Asynchronous Events subclause about how signals were caught and then resent to itself. These were removed in favor of the default case because they are essentially implementation details, unnecessary for the application. Similarly, information about where (and if) temporary files are created was removed from earlier drafts. The BSD -o option was omitted. It is a rare portable application that will use ar to extract object code from a library with concern for its modification time, since this can only be of importance to make. Hence, since this functionality is not deemed important for applications portability, the modification time of the extracted files is set to the current time. There is at least one known implementation (for a small computer) that can accommodate only object files for that system, disallowing mixed object and other files. The ability to handle any type of file is not only existing practice for most implementations, but is also a reasonable expectation. Consideration was given to changing the output format of ar -tv to the same format as the output of ls -l. This would have made parsing the output of ar the same as that of ls. This was rejected in part because the current ar format is commonly used and changes would break existing usage. Second, ar gives the user ID and group ID in numeric format Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.1 ar - Create and maintain library archives 817 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX separated by a slash. Changing this to be the user name and group name would not be right if the archive were moved to a machine that contained a different user database. Since ar cannot know whether the archive file was generated on the same machine, it cannot tell what to report. The text on the -ur option combination is historical practice--since one filename can easily represent two different files (e.g., /a/foo and /b/foo), it is reasonable to replace the member in the archive even when the modification time in the archive is identical to that in the file system. END_RATIONALE 6.2 make - Maintain, update, and regenerate groups of programs 6.2.1 Synopsis make [-einpqrst] [-f _m_a_k_e_f_i_l_e] ... [ -k | -S ] [_m_a_c_r_o=_n_a_m_e] ... [_t_a_r_g_e_t__n_a_m_e ...] 6.2.2 Description The make utility can be used as a part of software development to update 1 files that are derived from other files. A typical case is one where 1 object files are derived from the corresponding source files. The make 1 utility examines time relationships and updates those derived files 1 (called targets) that have modified times earlier than the modified times 1 of the files (called prerequisites) from which they are derived. A 1 description file (``makefile'') contains a description of the 1 relationships between files, and the commands that must be executed to 1 update the targets to reflect changes in their prerequisites. Each 1 specification, or rule, shall consist of a target, optional 1 prerequisites, and optional commands to be executed when a prerequisite is newer than the target. There are two types of rules: - Inference rules, which have one target name with at least one period (.) and no slash (/) - Target rules, which can have more than one target name In addition, make shall have a collection of built-in macros and inference rules that infer prerequisite relationships to simplify maintenance of programs. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 818 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 To receive exactly the behavior described in this clause, a portable makefile shall: - Include the special target .POSIX (see 6.2.7.3) - Omit any special target reserved for implementations (a leading period followed by uppercase letters) that has not been specified by this clause. The behavior of make is unspecified if either or both of these conditions 1 are not met. 1 6.2.3 Options The make utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -e Cause environment variables, including those with null values, to override macro assignments within makefiles. -f _m_a_k_e_f_i_l_e Specify a different makefile. The argument _m_a_k_e_f_i_l_e is a pathname of a description file, which is also referred to as the _m_a_k_e_f_i_l_e. A pathname of "-" shall denote the standard input. There can be multiple instances of this option, and they shall be processed in the order specified. The effect of specifying the same option- argument more than once is unspecified. See 6.2.7.1. -i Ignore error codes returned by invoked commands. This mode is the same as if the special target .IGNORE were specified without prerequisites. See 6.2.7.2. 1 -k Continue to update other targets that do not depend on the current target if a nonignored error occurs while executing the commands to bring a target up to date. -n Write commands that would be executed on standard output, but do not execute them. However, lines with a plus-sign (+) prefix shall be executed. In this mode, lines with an at-sign (@) character prefix shall be written to standard output. -p Write to standard output the complete set of macro definitions and target descriptions. The output format is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 819 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -q Return a zero exit value if the target file is up-to-date; otherwise, return an exit value of 1. Targets shall not be updated if this option is specified. However, a command line (associated with the targets) with a plus- sign (+) prefix shall be executed. -r Clear the suffix list and do not use the built-in rules. -S Terminate make if an error occurs while executing the commands to bring a target up-to-date. This shall be the default and the opposite of -k. -s Do not write command lines or touch messages (see -t) to standard output before executing. This mode shall be the same as if the special target .SILENT were specified 1 without prerequisites. See 6.2.7.2. 1 -t Update the modification time of each target as though a touch _t_a_r_g_e_t had been executed. See touch in 4.63. 1 Targets that have prerequisites but no commands (see 1 6.2.7.3), or that are already up-to-date, shall not be 1 touched in this manner. Write messages to standard output 1 for each target file indicating the name of the file and that it was touched. Normally, the command lines associated with each target are not executed. However, a command line with a plus-sign (+) prefix shall be executed. If the -k and -S options are both specified on the command line, by the MAKEFLAGS environment variable, or by the MAKEFLAGS macro, the last one evaluated shall take precedence. The MAKEFLAGS environment variable shall be evaluated first and the command line shall be evaluated second. Assignments to the MAKEFLAGS macro shall be evaluated as described in 6.2.5.3. 6.2.4 Operands The following operands shall be supported by the implementation: _t_a_r_g_e_t__n_a_m_e Target names, as defined in 6.2.7. If no target is specified, while make is processing the makefiles, the first target that make encounters that is not a special target or an inference rule shall be used. _m_a_c_r_o=_n_a_m_e Macro definitions, as defined in 6.2.7.4. If the _t_a_r_g_e_t__n_a_m_e and _m_a_c_r_o=_n_a_m_e operands are intermixed on the command line, the results are unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 820 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 6.2.5 External Influences 6.2.5.1 Standard Input The standard input shall be used only if the _m_a_k_e_f_i_l_e option-argument is -. See Input Files. 6.2.5.2 Input Files The input file, otherwise known as the makefile, is a text file containing rules, macro definitions, and comments. (See 6.2.7.) 1 6.2.5.3 Environment Variables The following environment variables shall affect the execution of make: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. MAKEFLAGS This variable shall be interpreted as a character string representing a series of option characters to be used as the default options. The implementation shall accept both of the following formats (but need not accept them when intermixed): (1) The characters are option letters without the leading hyphens or separation used on a command line. (2) The characters are formatted in a manner similar to a portion of the make command line: options are preceded by hyphens and -separated as described in 2.10.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 821 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The _m_a_c_r_o=_n_a_m_e macro definition operands can also be included. The difference between the contents of MAKEFLAGS and the command line is that the contents of the variable shall not be subjected to the word expansions (see 3.6) associated with parsing the command line values. When the command-line options -f or -p are used, 1 they shall take effect regardless of whether they 1 also appear in MAKEFLAGS. If they otherwise appear 1 in MAKEFLAGS, the result is undefined. 1 The MAKEFLAGS variable shall be accessed from the environment before the makefile is read. At that time, all of the options (except -f and -p) and command-line macros not already included in MAKEFLAGS shall be added to the MAKEFLAGS macro. The MAKEFLAGS macro shall be passed into the environment as an environment variable for all child processes. If the MAKEFLAGS macro is subsequently set by the makefile, it shall replace the MAKEFLAGS variable currently found in the environment. The value of the SHELL environment variable shall not be used as a macro and shall not be modified by defining the SHELL macro in a makefile or on 1 the command line. All other environment variables, including those with 1 null values, shall be used as macros, as defined in 6.2.7.4. 6.2.5.4 Asynchronous Events If not already ignored, make shall trap SIGHUP, SIGTERM, SIGINT, and SIGQUIT and remove the current target unless the target is a directory or the target is a prerequisite of the special target .PRECIOUS or unless one of the -n, -p, or -q options was specified. Any targets removed in this manner shall be reported in diagnostic messages of unspecified format, written to standard error. After this cleanup process, if any, 1 make shall take the standard action for all other signals; see 2.11.5.4. 1 6.2.6 External Effects Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 822 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 6.2.6.1 Standard Output The make utility shall write all commands to be executed to standard output unless the -s option was specified, the command is prefixed with an at-sign, or the special target .SILENT has either the current target as a prerequisite or has no prerequisites. If make is invoked without any work needing to be done, it shall write a message to standard output indicating that no action was taken. 6.2.6.2 Standard Error Used only for diagnostic messages. 6.2.6.3 Output Files None. However, utilities invoked by make may create additional files. 6.2.7 Extended Description The make utility attempts to perform the actions required to ensure that the specified target(s) are up-to-date. A target is considered out-of- date if it is older than any of its prerequisites or if it does not exist. The make utility shall treat all prerequisites as targets themselves and recursively ensure that they are up-to-date, processing 1 them in the order in which they appear in the rule. The make utility 1 shall use the modification times of files to determine if the 1 corresponding targets are out-of-date. (See 2.9.1.6.) 1 After make has ensured that all of the prerequisites of a target are up- to-date, and if the target is out-of-date, the commands associated with the target entry shall be executed. If there are no commands listed for the target, the target shall be treated as up-to-date. 6.2.7.1 Makefile Syntax A makefile can contain rules, macro definitions (see 6.2.7.4), and 1 comments. There are two kinds of rules: inference rules (6.2.7.5) and 1 target rules (6.2.7.3). The make utility shall contain a set of built-in 1 inference rules. If the -r option is present, the built-in rules shall 1 not be used and the suffix list shall be cleared. Additional rules of 1 both types can be specified in a makefile. If a rule or macro is defined 1 more than once, the value of the rule or macro shall be that of the last 1 one specified. Comments start with a number-sign (#) and continue until 1 an unescaped is reached. 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 823 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX By default, the file ./makefile shall be used. If ./makefile is not 1 found, the file ./Makefile shall be tried. If neither ./makefile nor 1 ./Makefile are found, other implementation-defined pathnames may also be 1 tried. 1 The -f option shall direct make to ignore ./makefile and ./Makefile (and any implementation-defined variants) and use the specified argument as a makefile instead. If the - argument is specified, standard input shall be used. The term _m_a_k_e_f_i_l_e is used to refer to any rules provided by the user whether in ./makefile, ./Makefile, or specified by the -f option. The rules in makefiles shall consist of the following types of lines: target rules, including special targets (see 6.2.7.3); inference rules (see 6.2.7.5); macro definitions (see 6.2.7.4); empty lines; and 1 comments. Comments start with a number sign (#) and continue until an unescaped is reached. When an escaped (one preceded by a backslash) is found anywhere in the makefile, it shall be replaced, along with any leading white space 1 on the following line, with a single . 1 6.2.7.2 Makefile Execution Command lines shall be processed one at a time by writing the command line to the standard output (unless one of the conditions listed below 1 under ``@'' suppresses the writing) and executing the command(s) in the 1 line. A character may precede the command to standard output. Commands shall be executed by passing the command line to the command interpreter in the same manner as if the string were the argument to the function in 7.1.1 [such as the _s_y_s_t_e_m() function in the C binding]. The environment for the command being executed shall contain all of the 1 variables in the environment of make. The macros from the command line 1 to make shall be added to make'_s environment. Other implementation- 1 defined variables may also be added to make'_s environment. If any 1 command-line macro has been defined elsewhere, the command-line value 1 shall overwrite the existing value. If the MAKEFLAGS variable is not set 1 in the environment in which make was invoked, in the makefile, or on the 1 command line, it shall be created by make, and shall contain all options 1 specified on the command line except for the -f and -p options. It may 1 also contain implementation-defined options. 1 By default, when make receives a nonzero status from the execution of a command, it terminates with an error message to standard error. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 824 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Command lines can have one or more of the following prefixes: a hyphen (-), an at-sign (@), or a plus-sign (+). These modify the way in which make processes the command. When a command is written to standard output, the prefix shall not be included in the output. - If the command prefix contains a hyphen, or the -i option is present, or the special target .IGNORE has either the current target as a prerequisite or has no prerequisites, any error found while executing the command shall be ignored. @ If the command prefix contains an at-sign and the command-line -n 1 option is not specified, or the -s option is present, or the 1 special target .SILENT has either the current target as a prerequisite or has no prerequisites, the command shall not be written to standard output before it is executed. + If the command prefix contains a plus-sign, this indicates a command line that shall be executed even if -n, -q, or -t is specified. 6.2.7.3 Target Rules Target rules are formatted as follows: _t_a_r_g_e_t [_t_a_r_g_e_t ...]: [_p_r_e_r_e_q_u_i_s_i_t_e ...][;_c_o_m_m_a_n_d] 1 [_c_o_m_m_a_n_d 1 _c_o_m_m_a_n_d 1 ...] 1 (_l_i_n_e _t_h_a_t _d_o_e_s _n_o_t _b_e_g_i_n _w_i_t_h <_t_a_b>) 1 Target entries are specified by a -separated, nonnull list of targets, then a colon, then a -separated, possibly empty list of prerequisites. Text following a semicolon, if any, and all following 1 lines that begin with a , are command lines to be executed to update 1 the target. The first nonempty line that does not begin with a or 1 # shall begin a new entry. An empty or blank line, or a line beginning 1 with #, may begin a new entry. 1 Applications shall select target names from the set of characters consisting solely of periods, underscores, digits, and alphabetics from the portable character set (see 2.4). Implementations may allow other characters in target names as extensions. The interpretation of targets 1 containing the characters ``%'' and ``"'' is implementation defined. 1 A target that has prerequisites, but does not have any commands, can be used to add to the prerequisite list for that target. Only one target rule for any given target can contain commands. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 825 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Lines that begin with one of the following are called _s_p_e_c_i_a_l _t_a_r_g_e_t_s and control the operation of make: .DEFAULT If the makefile uses this special target, it shall be specified with commands, but without prerequisites. The commands shall be used by make if there are no other rules available to build a target. .IGNORE Prerequisites of this special target are targets themselves; this shall cause errors from commands associated with them to be ignored in the same manner as specified by the -i option. Subsequent occurrences of .IGNORE shall add to the list of targets ignoring command errors. If no prerequisites are specified, make shall behave as if the -i option had been specified and errors from all commands associated with all targets shall be ignored. .POSIX This special target shall be specified without prerequisites or commands. If it appears before the first noncomment line in the makefile, make shall process the makefile as specified by this clause; otherwise, the behavior of make is unspecified. .PRECIOUS Prerequisites of this special target shall not be removed if make receives one of the asynchronous events explicitly described in 6.2.5.4. Subsequent occurrences of .PRECIOUS shall add to the list of precious files. If no prerequisites are specified, all targets in the makefile shall be treated as if specified with .PRECIOUS. .SILENT Prerequisites of this special target are targets themselves; this shall cause commands associated with them to not be written to the standard output before they are executed. Subsequent occurrences of .SILENT shall add to the list of targets with silent commands. If no prerequisites are specified, make shall behave as if the -s option had been specified and no commands or touch messages associated with any target shall be written to standard output. .SUFFIXES Prerequisites of .SUFFIXES shall be appended to the list of known suffixes and are used in conjunction with the inference rules (see 6.2.7.5). If .SUFFIXES does not have any prerequisites, the list of known suffixes shall be cleared. Makefiles shall not associate commands with .SUFFIXES. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 826 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Targets with names consisting of a leading period followed by the uppercase letters POSIX and then any other characters are reserved for future standardization. Targets with names consisting of a leading period followed by one or more uppercase letters are reserved for implementation extensions. 6.2.7.4 Macros Macro definitions are in the form: _s_t_r_i_n_g_1 = [_s_t_r_i_n_g_2] 1 The macro named _s_t_r_i_n_g_1 is defined as having the value of _s_t_r_i_n_g_2, where _s_t_r_i_n_g_2 is defined as all characters, if any, after the equals-sign, up 1 to a comment character (#) or an unescaped character. Any s immediately before or after the equals-sign shall be ignored. Subsequent appearances of $(_s_t_r_i_n_g_1) or ${_s_t_r_i_n_g_1} shall be replaced by _s_t_r_i_n_g_2. The parentheses or braces are optional if _s_t_r_i_n_g_1 is a single character. The macro $$ shall be replaced by the single character $. Applications shall select macro names from the set of characters 2 consisting solely of periods, underscores, digits, and alphabetics from 2 the portable character set (see 2.4). A macro name shall not contain an 2 equals-sign. Implementations may allow other characters in macro names 2 as extensions. 2 Macros can appear anywhere in the makefile. Macros in target lines shall be evaluated when the target line is read. Macros in command lines shall be evaluated when the command is executed. Macros in macro definition lines shall not be evaluated until the new macro being defined is used in a rule or command. A macro that has not been defined shall evaluate to a null string without causing any error condition. The forms $(_s_t_r_i_n_g_1[:_s_u_b_s_t_1=[_s_u_b_s_t_2]]) or ${_s_t_r_i_n_g_1[:_s_u_b_s_t_1=[_s_u_b_s_t_2]]} can be used to replace all occurrences of _s_u_b_s_t_1 with _s_u_b_s_t_2 when the 2 macro substitution is performed. The _s_u_b_s_t_1 to be replaced shall be recognized when it is a suffix at the end of a word in _s_t_r_i_n_g_1 (where a ``word,'' in this context, is defined to be a string delimited by the beginning of the line, a , or a ). Macro assignments shall be accepted from the sources listed below, in the order shown. If a macro name already exists at the time it is being processed, the newer definition shall replace the existing definition. (1) Macros defined in make's built-in inference rules. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 827 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (2) The contents of the environment, including the variables with null values, in the order defined in the environment. (3) Macros defined in the makefile(s), processed in the order specified. (4) Macros specified on the command line. It is unspecified whether the internal macros defined in 6.2.7.7 are accepted from the command line. If the -e option is specified, the order of processing sources (2) and (3) shall be reversed. The SHELL macro shall be treated specially. It shall be provided by make and set to the pathname of the shell command language interpreter (see sh in 4.56). The SHELL environment variable shall not affect the value of the SHELL macro. If SHELL is defined in the makefile or is specified on the command line, it shall replace the original value of the SHELL macro, but shall not affect the SHELL environment variable. Other effects of defining SHELL in the makefile or on the command line are implementation defined. 6.2.7.5 Inference Rules Inference rules are formatted as follows: _t_a_r_g_e_t: 1 _c_o_m_m_a_n_d 1 [_c_o_m_m_a_n_d] 1 ... (_l_i_n_e _t_h_a_t _d_o_e_s _n_o_t _b_e_g_i_n _w_i_t_h <_t_a_b> _o_r #) The _t_a_r_g_e_t portion shall be a valid target name (see 6.2.7.3) and shall 2 be of the form ._s_2 or ._s_1._s_2 (where ._s_1 and ._s_2 are suffixes that have 2 been given as prerequisites of the .SUFFIXES special target and _s_1 and _s_2 2 do not contain any slashes or periods.) If there is only one period in 2 the target, it is a single-suffix inference rule. Targets with two periods are double-suffix inference rules. Inference rules can have only 1 one target before the colon. 1 The makefile shall not specify prerequisites for inference rules; no characters other than white space shall follow the colon in the first line, except when creating the ``empty rule,'' described below. 1 Prerequisites are inferred, as described below. 1 Inference rules can be redefined. A target that matches an existing inference rule shall overwrite the old inference rule. An ``empty rule'' can be created with a command consisting of simply a semicolon (that is, Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 828 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 the rule still exists and is found during inference rule search, but since it is empty, execution has no effect). The empty rule also can be formatted as follows: _r_u_l_e: ; where zero or more s separate the colon and semicolon. 2 The make utility uses the suffixes of targets and their prerequisites to infer how a target can be made up-to-date. A list of inference rules defines the commands to be executed. By default, make contains a built- in set of inference rules. Additional rules can be specified in the makefile. The special target .SUFFIXES contains as its prerequisites a list of suffixes that are to be used by the inference rules. The order in which 1 the suffixes are specified defines the order in which the inference rules 1 for the suffixes are used. New suffixes shall be appended to the current list by specifying a .SUFFIXES special target in the makefile. A .SUFFIXES target with no prerequisites shall clear the list of suffixes. An empty .SUFFIXES target followed by a new .SUFFIXES list is required to change the order of the suffixes. Normally, the user would provide an inference rule for each suffix. The 1 inference rule to update a target with a suffix ._s_1 from a prerequisite 1 with a suffix ._s_2 is specified as a target ._s_2._s_1. The internal macros provide the means to specify general inference rules. (See 6.2.7.7.) 1 When no target rule is found to update a target, the inference rules shall be checked. The suffix of the target (._s_1) to be built is compared to the list of suffixes specified by the .SUFFIXES special targets. If the ._s_1 suffix is found in .SUFFIXES, the inference rules shall be searched in the order defined for the first ._s_2._s_1 rule whose prerequisite file ($*._s_2) exists. If the target is out-of-date with respect to this prerequisite, the commands for that inference rule shall be executed. If the target to be built does not contain a suffix and there is no rule for the target, the single suffix inference rules shall be checked. The single-suffix inference rules define how to build a target if a file is 1 found with a name that matches the target name with one of the single 1 suffixes appended. A rule with one suffix ._s_2 is the definition of how 1 to build _t_a_r_g_e_t from _t_a_r_g_e_t._s_2. The other suffix (._s_1) is treated as null. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 829 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 6.2.7.6 Libraries If a target or prerequisite contains parentheses, it shall be treated as a member of an archive library. For the _l_i_b(_m_e_m_b_e_r._o) expression _l_i_b refers to the name of the archive library and _m_e_m_b_e_r.o to the member name. The member shall be an object file with the .o suffix. The modification time of the expression is the modification time for the member as kept in the archive library. See 6.1. The .a suffix refers to an archive library. The ._s_2.a rule is used to update a member in the library from a file with a suffix ._s_2. 6.2.7.7 Internal Macros The make utility shall maintain five internal macros that can be used in 1 target and inference rules. In order to clearly define the meaning of 1 these macros, some clarification of the terms ``target rule,'' 1 ``inference rule,'' ``target,'' and ``prerequisite'' is necessary. 1 Target rules are specified by the user in a makefile for a particular 1 target. Inference rules are user- or make-specified rules for a 1 particular class of target names. Explicit prerequisites are those 1 prerequisites specified in a makefile on target lines. Implicit 1 prerequisites are those prerequisites that are generated when inference 1 rules are used. Inference rules are applied to implicit prerequisites or 1 to explicit prerequisites that do not have target rules defined for them 1 in the makefile. Target rules are applied to targets specified in the 1 makefile. 1 Before any target in the makefile is updated, each of its prerequisites 1 (both explicit and implicit) shall be updated. This shall be 1 accomplished by recursively processing each prerequisite. Upon 1 recursion, each prequisite shall become a target itself. Its 1 prerequisites in turn shall be processed recursively until a target is 1 found that has no prerequisites, at which point the recursion shall stop. 1 The recursion then shall back up, updating each target as it goes. 1 In the definitions that follow, the word ``target'' refers to one of: 1 - A target specified in the makefile, 1 - An explicit prerequisite specified in the makefile that becomes the 1 target when make processes it during recursion, or 1 - An implicit prerequisite that becomes a target when make processes 1 it during recursion. 1 In the definitions that follow, the word ``prerequisite'' refers to 1 either: 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 830 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 - An explicit prerequisite specified in the makefile for a particular 1 target, or 1 - An implicit prerequisite generated as a result of locating an 1 appropriate inference rule and corresponding file that matches the 1 suffix of the target. 1 The five internal macros are: 1 $@ The $@ macro shall evaluate to the full target name of the 1 current target, or the archive filename part of a library 1 archive target. It shall be evaluated for both target and 1 inference rules. 1 For example, in the .c.a inference rule, $@ represents the out- 1 of-date .a file to be built. Similarly, in a makefile target 1 rule to build lib.a from file.c, $@ represents the out-of-date 1 lib.a. 1 $% The $% macro shall be evaluated only when the current target is 1 an archive library member of the form _l_i_b_n_a_m_e(_m_e_m_b_e_r.o). In 1 these cases, $@ shall evaluate to _l_i_b_n_a_m_e and $% shall evaluate 1 to _m_e_m_b_e_r.o. The $% macro shall be evaluated for both target 1 and inference rules. 1 For example, in a makefile target rule to build lib.a(file.o), 1 $% represents file.o--as opposed to $@, which represents lib.a. 1 $? The $? macro shall evaluate to the list of prerequisites that 1 are newer than the current target. It shall be evaluated for 1 both target and inference rules. 1 For example, in a makefile target rule to build prog from 1 file1.o, file2.o, and file3.o, and where prog is not out of date 1 with respect to file1.o, but is out of date with respect to 1 file2.o and file3.o, $? represents file2.o and file3.o. 1 $< In an inference rule, $< shall evaluate to the file name whose 1 existence allowed the inference rule to be chosen for the 1 target. In the .DEFAULT rule, the $< macro shall evaluate to 1 the current target name. The $< macro shall be evaluated only 1 for inference rules. 1 For example, in the .c.a inference rule, $< represents the 1 prerequisite .c file. 1 $* The $* macro shall evaluate to the current target name with its 1 suffix deleted. It shall be evaluated at least for inference 2 rules. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 831 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX For example, in the .c.a inference rule, $*.o represents the out-of-date .o file that corresponds to the prerequisite .c file. Each of the internal macros has an alternate form. When an uppercase D or F is appended to any of the macros, the meaning is changed to the _d_i_r_e_c_t_o_r_y _p_a_r_t for D and _f_i_l_e_n_a_m_e _p_a_r_t for F. The directory part is the path prefix of the file without a trailing slash; for the current directory, the directory part is ".". When the $? macro contains more than one prerequisite filename, the $(?D) and $(?F) [or ${?D} and ${?F}] macros expand to a list of directory name parts and filename parts respectively. For the target _l_i_b(_m_e_m_b_e_r._o) and the ._s_2.a rule, the internal macros are defined as: $< _m_e_m_b_e_r._s_2 $* _m_e_m_b_e_r $@ _l_i_b $? _m_e_m_b_e_r._s_2 $% _m_e_m_b_e_r._o 6.2.7.8 Default Rules The default rules for make shall achieve results that are the same as if the following were used. Implementations that do not support the C Language Development Utilities Option may omit CC, CFLAGS, YACC, YFLAGS, LEX, LFLAGS, LDFLAGS, and the .c, .y, and .l inference rules. Implementations that do not support the FORTRAN Language Development Utilities Option may omit FC, FFLAGS, and the .f inference rules. Implementations may provide additional macros and rules. NOTE: In a future version of this standard, the default rules may be specified separately from the make clause, such as with the language- dependent development options. _S_U_F_F_I_X_E_S _A_N_D _M_A_C_R_O_S ._S_U_F_F_I_X_E_S: ._o ._c ._y ._l ._a ._s_h ._f _1 _M_A_K_E=_m_a_k_e _A_R=_a_r _A_R_F_L_A_G_S=-_r_v _Y_A_C_C=_y_a_c_c _Y_F_L_A_G_S= Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 832 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _L_E_X=_l_e_x _L_F_L_A_G_S= _L_D_F_L_A_G_S= _C_C=_c_8_9 _C_F_L_A_G_S=-_O _F_C=_f_o_r_t_7_7 _F_F_L_A_G_S=-_O _1 _S_I_N_G_L_E _S_U_F_F_I_X _R_U_L_E_S ._c: $(_C_C) $(_C_F_L_A_G_S) $(_L_D_F_L_A_G_S) -_o $@ $< ._f: $(_F_C) $(_F_F_L_A_G_S) $(_L_D_F_L_A_G_S) -_o $@ $< ._s_h: _c_p $< $@ _c_h_m_o_d _a+_x $@ _D_O_U_B_L_E _S_U_F_F_I_X _R_U_L_E_S ._c._o: $(_C_C) $(_C_F_L_A_G_S) -_c $< ._f._o: $(_F_C) $(_F_F_L_A_G_S) -_c $< ._y._o: $(_Y_A_C_C) $(_Y_F_L_A_G_S) $< $(_C_C) $(_C_F_L_A_G_S) -_c _y._t_a_b._c _r_m -_f _y._t_a_b._c _1 _m_v _y._t_a_b._o $@ ._l._o: $(_L_E_X) $(_L_F_L_A_G_S) $< $(_C_C) $(_C_F_L_A_G_S) -_c _l_e_x._y_y._c _r_m -_f _l_e_x._y_y._c _1 _m_v _l_e_x._y_y._o $@ ._y._c: $(_Y_A_C_C) $(_Y_F_L_A_G_S) $< _m_v _y._t_a_b._c $@ ._l._c: $(_L_E_X) $(_L_F_L_A_G_S) $< _m_v _l_e_x._y_y._c $@ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 833 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ._c._a: $(_C_C) -_c $(_C_F_L_A_G_S) $< $(_A_R) $(_A_R_F_L_A_G_S) $@ $*._o _r_m -_f $*._o ._f._a: $(_F_C) -_c $(_F_F_L_A_G_S) $< $(_A_R) $(_A_R_F_L_A_G_S) $@ $*._o _r_m -_f $*._o 6.2.8 Exit Status When the -q option is specified, the make utility shall exit with one of the following values: 0 Successful completion. 1 The target was not up-to-date. >1 An error occurred. When the -q option is not specified, the make utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. 6.2.9 Consequences of Errors Default. BEGIN_RATIONALE 6.2.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The make provided here is intended to provide the means for changing portable source code into runnable executables on a POSIX.2 system. It reflects the most common features present in System V and BSD makes. Historically, the make utility has been an especially fertile ground for vendor- and research-organization-specific syntax modifications and extensions. Examples include: - Syntax supporting parallel execution (Sequent, Cray, GNU, and others) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 834 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 - Additional ``operators'' separating targets and their prerequisites (System V, BSD, and others) - Specifying that command lines containing the strings ${MAKE} and $(MAKE) are executed when the -n option is specified (GNU and System V) - Modifications of the meaning of internal macros when referencing libraries (BSD and others) - Using a single instance of the shell for all of a target's command lines (BSD and others) - Allowing spaces as well as tabs to delimit command lines (BSD) - Adding C-preprocessor-style ``include'' and ``ifdef'' constructs (System V, GNU, BSD, and others) - Remote execution of command lines (Sprite and others) - Specifying additional special targets (Sun, BSD, System V, and most others). Additionally, many vendors and research organizations have rethought the basic concepts of make, creating vastly extended, as well as completely new, syntaxes. Each of these versions of ``make'' fulfills the needs of a different community of users; it is unreasonable for this standard to require behavior that would be incompatible (and probably inferior) to existing practice for such a community. In similar circumstances, when the industry has enough sufficiently incompatible formats as to make them irreconcilable, POSIX.2 has followed one or both of two courses of action. Commands have been renamed (cksum, echo, and pax) and/or command-line options have been provided to select the desired behavior (grep, od, and pax). Because the syntax specified for the make utility is, by and large, a subset of the syntaxes accepted by almost all versions of make, it was decided that it would be counter-productive to change the name. And since the makefile itself is a basic unit of portability, it would not be completely effective to reserve a new option letter, such as make -P, to achieve the portable behavior. Therefore, the special target .POSIX was added to the makefile, allowing users to specify ``standard'' behavior. This special target does not preclude extensions in the make utility, or such extensions being used by the makefile specifying the target; it does, however, preclude any extensions from being applied that could alter the behavior of previously valid syntax; such extensions must be controlled via command-line options or new special targets. It is incumbent upon portable makefiles to specify the .POSIX special target in Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 835 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX order to guarantee that they are not affected by local extensions. The portable version of make described in this clause is not intended to be the state of the art software generation tool and, as such, some newer and more leading-edge features have not been included. An attempt has been made to describe the portable makefile in a manner that does not preclude such extensions as long as they do not disturb the portable behavior described here. One use of this make and the makefile syntax is as a format that newer versions of make can generate for portability purposes. _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following command: make makes the first target found in the makefile. The following command: make junk makes the target junk. The following makefile says that pgm depends on two files, a.o and b.o, and that they in turn depend on their corresponding source files (a.c and b.c), and a common file incl.h: pgm: a.o b.o c89 a.o b.o -o pgm a.o: incl.h a.c c89 -c a.c b.o: incl.h b.c c89 -c b.c An example for making optimized .o files from .c files is: .c.o: c89 -c -O $*.c or: .c.o: c89 -c -O $< Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 836 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The most common use of the archive interface follows. Here, it is assumed that the source files are all C language source: lib: lib(file1.o) lib(file2.o) lib(file3.o) @echo lib is now up-to-date The .c.a rule is used to make file1.o, file2.o, and file3.o and insert 1 them into lib. 1 The -k and -S options are both present so that the relationship between the command line, the MAKEFLAGS variable, and the makefile can be controlled precisely. If the k flag is passed in MAKEFLAGS and a command is of the form: $(MAKE) -S foo then the default behavior is restored for the child make. When the -n option is specified, it is always added to MAKEFLAGS. This allows a recursive make -n _t_a_r_g_e_t to be used to see all of the action that would be taken to update _t_a_r_g_e_t. The definition of MAKEFLAGS allows both the System V letter string and the BSD command-line formats. The two formats are sufficiently different to allow implementations to support both without ambiguity. Because of widespread historical practice, interpreting a # number sign inside a variable as the start of a comment has the unfortunate side effect of making it impossible to place a number sign in a variable, thus forbidding something like CFLAGS = "-D COMMENT_CHAR='#'" Earlier drafts stated that an ``unquoted'' number sign was treated as the start of a comment. The make utility does not pay any attention to quotes. A number sign starts a comment regardless of its surroundings. The treatment of escaped s throughout the makefile is historical practice. For example, the inference rule: .c.o\ : works and the macro Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 837 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX f= bar baz\ biz a: echo ==$f== will echo ==bar baz biz==. If $? were /usr/include/stdio.h /usr/include/unistd.h foo.h then $(?D) would be /usr/include /usr/include . and $(?F) would be stdio.h unistd.h foo.h The contents of the built-in rules can be viewed by running: make -p -f /dev/null 2>/dev/null Many historical makes stop chaining together inference rules when an 1 intermediate target is nonexistent. For example, it might be possible 1 for a make to determine that both .y.c and .c.o could be used to convert 1 a .y to a .o. Instead, in this case, make requires the use of a .y.o 1 rule. 1 The text about ``other implementation-defined pathnames may also be tried'' in addition to ./makefile and ./Makefile is to allow such extensions as SCCS/s.Makefile and other variations. It was made an implementation-defined requirement (as opposed to unspecified behavior) to highlight surprising implementations that might select something unexpected like /etc/Makefile. For inference rules, the description of $< and $? seem similar. However, an example shows the minor difference. In a makefile containing foo.o: foo.h if foo.h is newer than foo.o, yet foo.c is older than foo.o, the built-in rule to make foo.o from foo.c will be used, with $< equal to foo.c and $? equal to foo.h. (If foo.c is also newer than foo.o, $< is equal to foo.c and $? is equal to ``foo.h foo.c''.) Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 838 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Earlier drafts contained the macro NPROC as a means of specifying that make should use _n processes to do the work required. While this feature is a valuable extension for many systems, it is not common usage and could require other nontrivial extensions to makefile syntax. This extension is not required by the standard, but could be provided as a compatible extension. The macro PARALLEL is used by some historical 1 systems with essentially the same meaning (but without using a name that 1 is a common system limit value). It is suggested that implementors 1 recognize the existing use of NPROC and/or PARALLEL as extensions to 1 make. 1 The default rules are based on System V. The default CC= value is c89 instead of cc because POSIX.2 does not standardize the utility named cc. Thus, every conforming application would be required to define CC=c89 to expect to run. There is no advantage conferred by the hope that the makefile might hit the ``preferred'' compiler because there is no way that this can be guaranteed to work. Also, since the portable makescript can only use the c89 options, no advantage is conferred in terms of what the script can do. It is a quality of implementation issue as to whether c89 is as good as cc. Since SCCS and RCS are not part of POSIX.2, all make references to SCCS extensions have been omitted. The -d option to make is frequently used to produce debugging information, but is too implementation-dependent to add to the standard. The -p option is not passed in MAKEFLAGS on most existing implementations and to change this would cause many implementations to break without sufficiently increased portability. Commands that begin with a plus-sign (+) are executed even if the -n option is present. Based on the GNU version of make, the behavior of -n when the plus-sign prefix is encountered has been extended to apply to -q and -t as well. However, the System V convention of forcing command execution with -n when a target's command line contains either of the strings $(MAKE) or ${MAKE} has not been adopted. This functionality appeared in earlier drafts, but the danger of this approach was pointed out with the following example of a portion of a makefile: subdir: cd subdir; rm all_the_files; $(MAKE) The loss of the System V behavior in this case is well-balanced by the safety afforded to other makefiles that were not aware of this situation. In any event, the command-line plus-sign prefix can provide the desired functionality. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 839 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The double colon in the target rule format is supported in BSD systems to allow more than one target line containing the same target name to have commands associated with it. Since this is not functionality described in the _S_V_I_D or XPG3, it has been allowed as an extension, but not mandated. The default rules are provided with text specifying that the built-in rules are to be the same _a_s _i_f the listed set were used. The intent is that implementations should be able to use the rules without change, but will be allowed to alter them in ways that do not affect the primary behavior. The best way to provide portable makefiles is to include all of the rules needed in the makefile itself. The rules provided use only features provided by other parts of the standard. The default rules include rules for optional commands in the standard. Only rules pertaining to commands that are provided are needed in an implementation's default set. The argument could be made to drop the default rules list from the standard. They provide convenience, but do not enhance portability of applications. The prime benefit is in portability of users who wish to type make command and have the command build from a command.c file. The historical MAKESHELL feature was omitted. In some implementations it is used to provide a way of letting a user override the shell to be used to run make commands. This was confusing; for a portable make, the shell should be chosen by the makefile writer or specified on the make command line and not by a user running make. The make utilities in most historical implementations process the prerequisites of a target in left-to-right order, and the POSIX.2 1 makefile format requires this. It supports the standard idiom used in 1 many makefiles that produce yacc programs, for example: 1 foo: y.tab.o lex.o main.o 1 $(CC) $(CFLAGS) -o $@ t.tab.o lex.o main.o 1 In this example, if make chose any arbitrary order, the lex.o might not 1 be made with the correct y.tab.h. Although there may be better ways to 1 express this relationship, it is widely used historically. 1 Implementations that desire to update prerequisites in parallel should 1 require an explicit extension to make or the makefile format to 1 accomplish it, as described previously. 1 The algorithm for determining a new entry for target rules is partially 1 unspecified. Some historical makes allow blank, empty, or comment lines 1 within the collection of commands marked by leading s. A conforming 1 makefile must ensure that each command starts with a , but 1 implementations are free to ignore blank, empty, and comment lines 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 840 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 without triggering the start of a new entry. 1 The Asynchronous Events subclause includes having SIGTERM and SIGHUP, along with the more traditional SIGINT and SIGQUIT, remove the current target unless directed not to. SIGTERM and SIGHUP were added to parallel other utilities that have historically cleaned up their work as a result of these signals. All but SIGQUIT is required to resend itself the signal it received to cause make to exit with a status that reflects the signal. The results from SIGQUIT are partially unspecified because, on systems that create core files upon receipt of SIGQUIT, the core from make would conflict with a core file from the command that was running when the SIGQUIT arrived. The main concern here was to prevent damaged files from appearing up-to-date when make is rerun. The .PRECIOUS special target was extended to globally affect all targets (by specifying no prerequisites). The .IGNORE and .SILENT special targets were extended to allow prerequisites; it was judged to be more useful in some cases to be able to turn off errors or echoing for a list of targets than for the entire makefile. These extensions to System V's make were made to match historical practice from the BSD make. Macros are not exported to the environment of commands to be run. This was never the case in any historical make and would have serious consequences. The environment is the same as the environment to make except that MAKEFLAGS and macros defined on the make command line are added. Some implementations do not use _s_y_s_t_e_m() for all command lines, as required by the POSIX.2 portable makefile format; as a performance enhancement, they select lines without shell metacharacters for direct execution by _e_x_e_c_v_e(). There is no requirement that _s_y_s_t_e_m() be used specifically, but merely that the same results be achieved. The metacharacters typically used to bypass the direct _e_x_e_c_v_e() execution have been any of: = | ^ ( ) ; & < > * ? [ ] : $ ` ' " \ \n The default in some advanced versions of make is to group all the command lines for a target and execute them using a single shell invocation; the System V method is to pass each line individually to a separate shell. The single-shell method has the advantages in performance and the lack of a requirement for many continued lines. However, converting to this newer method has caused portability problems with many historical makefiles, so the behavior with the POSIX makefile is specified to be the same as System V's. It is suggested that the special target .ONESHELL be used as an implementation extension to achieve the single-shell grouping for a target or group of targets. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 841 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Novice users of make have had difficulty with the historical need to start commands with a character. Since it is often difficult to discern differences between and characters on terminals or printed listings, confusing bugs can arise. In earlier drafts, an attempt was made to correct this problem by allowing leading _s instead of _s. However, implementors reported many makefiles that failed in subtle ways following this change and it is difficult to implement a make that unambiguously can differentiate between macro and command lines. There is extensive historical practice of allowing leading spaces before macro definitions. Forcing macro lines into column 1 would be a significant backward compatibility problem for some makefiles. Therefore, historical practice was restored. The System V INCLUDE feature was considered, but not included. This would treat a line that began in the first column and contained INCLUDE <_f_i_l_e_n_a_m_e> as an indication to read <_f_i_l_e_n_a_m_e> at that point in the makefile. This is difficult to use in a portable way and it raises concerns about nesting levels and diagnostics. System V, BSD, GNU, and others have used different methods for including files. Macros used within other macros are evaluated when the new macro is used rather than when the new macro is defined. Therefore: MACRO = _v_a_l_u_e_1 NEW = $(MACRO) MACRO = _v_a_l_u_e_2 target: echo $(NEW) would produce _v_a_l_u_e_2 and not _v_a_l_u_e_1 since NEW was not expanded until it was needed in the echo command line. The System V dynamic dependency feature was not added. It would support: cat: $$@.c that would expand to cat: cat.c This feature exists only in the new version of System V make and, while useful, is not in wide usage. This means that macros are expanded twice for prerequisites: once at makefile parse time and once at target update time. Consideration was given to adding metarules to the POSIX make. This would make "%.o: %.c" the same as ".c.o:". This is quite useful and available from some vendors, but it would cause too many changes to this Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 842 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 make to support. It would have introduced rule chaining and new substitution rules. However, the rules for target names have been set to 1 reserve the % and " characters. These are traditionally used to 1 implement metarules and quoting of target names, respectively. 1 Implementors are strongly encouraged to use these characters only for 1 these purposes. 1 A request was made to extend the suffix delimiter character from a period to any character. The metarules in newer makes solves this problem in a more general way. POSIX.2 is staying with the more conservative historical definition until a clear industry consensus on make technology might prompt a revision of this standard. The standard output format for the -p option is not described because it is primarily a debugging option and the format is not generally useful to programs. In historical implementations the output is not suitable for use in generating makefiles. The -p format has been variable across historical implementations. Therefore, the definition of -p was only to provide a consistently named option for obtaining make script debugging information. Some historical implementations have not cleared the suffix list with -r. Implementations should be aware that some historical applications have intermixed _t_a_r_g_e_t__n_a_m_e and _m_a_c_r_o=_n_a_m_e operands on the command line, expecting that all of the macros will be processed before any of the targets are dealt with. Portable applications do not do this, but some backward compatibility support may be warranted. Empty inference rules are specified with a semicolon command rather than omitting all commands, as described in a previous draft. The latter case has no traditional meaning and is reserved for implementation extensions, such as in GNU make. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.2 make - Maintain, update, and regenerate groups of programs 843 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 6.3 strip - Remove unnecessary information from executable files 6.3.1 Synopsis strip _f_i_l_e ... 6.3.2 Description The strip utility shall remove from executable files named by the _f_i_l_e operands any information the implementor deems unnecessary to proper execution of those files. The nature of that information is unspecified. The effect of strip shall be the same as the use of the -s option to any of the compilers defined by this standard. 6.3.3 Options None. 6.3.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname referring to an executable file. 6.3.5 External Influences 6.3.5.1 Standard Input None. 6.3.5.2 Input Files The input files shall be in the form of executable files successfully produced by any compiler defined by this standard. 6.3.5.3 Environment Variables The following environment variables shall affect the execution of strip: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 844 6 Software Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments). LC_MESSAGES This variable shall determine the language in which messages should be written. 6.3.5.4 Asynchronous Events Default. 6.3.6 External Effects 6.3.6.1 Standard Output None. 6.3.6.2 Standard Error Used only for diagnostic messages. 6.3.6.3 Output Files The strip utility shall produce executable files of unspecified format. 6.3.7 Extended Description None. 6.3.8 Exit Status The strip utility shall exit with one of the following values: 0 Successful completion. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 6.3 strip - Remove unnecessary information from executable files 845 P1003.2/D11.2 >0 An error occurred. 6.3.9 Consequences of Errors Default. BEGIN_RATIONALE 6.3.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e None. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Historically, this utility has been used to remove the symbol table from an executable file. It was included since it is known that the amount of symbolic information can amount to several megabytes; the ability to remove it in a portable manner was deemed important, especially for smaller systems. The behavior of strip is said to be the same as the -s option to a compiler. While the end result is essentially the same it is not required to be identical. The same effect can be achieved with either -s during a compile or a strip on the final object file. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 846 6 Software Development Utilities Option P1003.2/D11.2 Section 7: Language-Independent System Services This clause contains functional specifications for services that give applications access to features defined elsewhere in this standard. These services allow applications written in high-level languages to (1) execute commands using the shell language, (2) obtain values of environment variables, (3) perform regular expression and pattern matching, (4) process command arguments in a standard manner, (5) generate pathnames from a pattern, (6) perform shell word expansions, (7) obtain system configuration information, and (8) set locale control information This clause does not define interfaces, but services that shall be provided by the interfaces in a language-dependent binding. This clause is optional, in that an implementation is not required to support any language binding to these services. However, any language binding shall support all of the services described here. Implementations therefore provide support for services in this clause by supplying a language- dependent binding such as the one defined in Annex B. Such a system would specify conformance to the language-dependent binding, not to the language-independent bindings given here. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 7 Language-Independent System Services 847 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 7.0.1 Language-Independent System Services Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Section 7 essentially is a metastandard, in that it specifies services that must be in a language-dependent binding. An implementation conforms to a specific language-dependent binding such as for the C language, in Annex B, and the language-dependent binding must conform to the specifications in this clause. In this standard, the language-independent specifications have not yet been developed. The language-independent syntax is being created in parallel by the POSIX.1 working group. Therefore, the C language bindings temporarily described in Annex B are actually the full interface specifications. It is the intention of the P1003.2 working group to rectify this situation in a later supplement by moving the majority of the interface specifications back into this clause, leaving Annex B with only brief descriptions of the C bindings to those services. This clause does not attempt to include everything that would be required of a language binding. The services here are those that are necessary to make use of features defined elsewhere in the standard, but that are not normally available in every language. Clearly a language that could not open, read, and write the files manipulated by the utilities in this standard would not be very useful, but this service is normally provided by any language and therefore isn't called out here. The ability to obtain values of environment variables exported from the shell, on the other hand, is not universally available, so that service is included here. END_RATIONALE 7.1 Shell Command Interface 7.1.1 Execute Shell Command Any language binding to Language-Independent System Services shall include a facility to execute a shell command. The language-independent specification for this facility has not been developed. The C binding for this facility is the _s_y_s_t_e_m() function described in B.3.1. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 848 7 Language-Independent System Services Part 2: SHELL AND UTILITIES P1003.2/D11.2 7.1.2 Pipe Communications with Programs Any language binding to Language-Independent System Services shall include a facility to execute a shell command, and to write the standard input or read the standard output of that command via a pipe. The language-independent specification for this facility has not been developed. The C binding for this facility is the _p_o_p_e_n() and _p_c_l_o_s_e() functions described in B.3.2. 7.2 Access Environment Variables Any language binding to Language-Independent System Services shall include a facility to obtain values of environment variables, as specified in POSIX.1 {8}. The language-independent specification for this facility has not been developed. The C binding for this facility is the _g_e_t_e_n_v() function described in POSIX.1 {8} 4.6.1. BEGIN_RATIONALE 7.2.1 Access Environment Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This facility is required in POSIX.2 so that applications can obtain values of exported shell variables. END_RATIONALE 7.3 Regular Expression Matching Any language binding to Language-Independent System Services shall include a facility to interpret regular expressions as described in 2.8. The language-independent specification for this facility has not been developed. The C binding is the _r_e_g_c_o_m_p(), _r_e_g_e_x_e_c(), and _r_e_g_f_r_e_e() functions described in B.5. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 7.3 Regular Expression Matching 849 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 7.3.1 Regular Expression Matching Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This service is important enough that it should be required by any language binding to POSIX.2. Regular expression parsing and pattern matching are listed separately, since they are different services. A language binding could provide different functions to support regular expressions and patterns, or could combine them into a single function. END_RATIONALE 7.4 Pattern Matching Any language binding to Language-Independent System Services shall include a facility to interpret patterns as described in 3.13.1 and 3.13.2. This facility shall allow the application to specify whether a slash character in the string to be matched will be treated as a regular character, or must be explicitly matched against a slash in the pattern. The language-independent specification for this facility has not been developed. The C binding is the _f_n_m_a_t_c_h() function described in B.6. 7.5 Command Option Parsing Any language binding to Language-Independent System Services shall include a facility to parse the options and operands from the command line that invoked the application. The language-independent specification for this facility has not been developed. The C binding for this facility is the _g_e_t_o_p_t() function described in B.7. 7.6 Generate Pathnames Matching a Pattern Any language binding to Language-Independent System Services shall include a facility to generate pathnames matching a pattern as described in 3.13. The language-independent specifications for this facility has not been developed. The C binding is the _g_l_o_b() and _g_l_o_b_f_r_e_e() functions described in B.8. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 850 7 Language-Independent System Services Part 2: SHELL AND UTILITIES P1003.2/D11.2 7.7 Perform Word Expansions Any language binding to Language-Independent System Services shall include a facility to do shell word expansions as described in 3.6. The language-independent specification for this facility has not been developed. The C binding is the _w_o_r_d_e_x_p() and _w_o_r_d_f_r_e_e() functions described in B.9. BEGIN_RATIONALE 7.7.1 Perform Word Expansions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) See the rationale for this function in B.9. END_RATIONALE 7.8 Get POSIX Configurable Variables 7.8.1 Get String-Valued Configurable Variables Any language binding to Language-Independent System Services shall include a facility to obtain string configurable variables. The language-independent specification for this facility has not been developed. The C binding for this facility is the _c_o_n_f_s_t_r() function described in B.10.1. 7.8.2 Get Numeric-Valued Configurable Variables Any language binding to Language-Independent System Services shall include facilities to determine the current values of system and pathname limits or options (_v_a_r_i_a_b_l_e_s), as specified by POSIX.1 {8}. The configurable variables listed in Table 7-1, which are defined in POSIX.1 {8}, shall be available in any POSIX.2 language-dependent binding, with minimum values as given in POSIX.1 {8}. Other POSIX.1 {8} configurable variables may be supported, but are not required by POSIX.2. This facility shall also make available current values for all system limits defined in 2.13. The language-independent specifications for these facilities have not been developed. The C bindings are the _s_y_s_c_o_n_f() function described in POSIX.1 {8} 4.8, and the _p_a_t_h_c_o_n_f() and _f_p_a_t_h_c_o_n_f() functions defined in POSIX.1 {8} 5.7. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 7.8 Get POSIX Configurable Variables 851 P1003.2/D11.2 BEGIN_RATIONALE 7.8.2.1 Get Numeric-Valued Configurable Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This description calls out specific values that _s_y_s_c_o_n_f(), _p_a_t_h_c_o_n_f(), and _f_p_a_t_h_c_o_n_f() are required to support. Some of the POSIX.1 {8} values are excluded from this list because they are not relevant in a POSIX.2- only environment. Currently, only {CLK_TCK} is not required by POSIX.2. This description does not specify the _n_a_m_e values for the arguments to the various functions. This is because different language bindings might use different naming conventions, or might use a completely different scheme for obtaining the required configurable values. Specific names for the _n_a_m_e values for the C language binding are given in B.10.2. END_RATIONALE 7.9 Locale Control Any language binding to Language-Independent System Services shall include a facility to set locale control information. The language-independent specification for this facility has not been developed. The C binding for this facility is described in B.11. BEGIN_RATIONALE 7.9.0.1 Locale Control Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This facility is required in POSIX.2 so that applications can control the locale, which affects the operation of POSIX.2 utilities. END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 852 7 Language-Independent System Services Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table 7-1 - POSIX.1 Numeric-Valued Configurable Variables __________________________________________________________________________________________________________________________________________________ {ARG_MAX} {NAME_MAX} {_POSIX_CHOWN_RESTRICTED} {CHILD_MAX} {NGROUPS_MAX} {_POSIX_JOB_CONTROL} {LINK_MAX} {OPEN_MAX} {_POSIX_NO_TRUNC} {MAX_CANON} {PATH_MAX} {_POSIX_SAVED_IDS} {MAX_INPUT} {PIPE_BUF} {_POSIX_VDISABLE} __________________________________________________________________________________________________________________________________________________ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 7.9 Locale Control 853 P1003.2/D11.2 Annex A (normative) C Language Development Utilities Option This annex describes utilities used for the development of C language applications, including compilation or translation of C source code and complex program generators for simple lexical tasks and processing of context-free grammars. The utilities described in this annex may be provided by the conforming system; however, any system claiming conformance to the C Language Development Utilities Option shall provide all of the utilities described here. The utilities described in Section 6 are prerequisites to this annex. BEGIN_RATIONALE A.0.1 C Language Development Utilities Option Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The portions of this standard that concern specific languages--currently C and FORTRAN--have been collected to the rear of the document as Normative Annexes. For purposes of conformance, they are no less a part of the standard than one of the numbered sections. They were grouped as Annexes to illustrate that the base standard is [planned to be] language independent, giving a small degree of separation. The working group also wished to send a message to those groups planning other language bindings: the standard is not C-oriented, and there's plenty of room to add more annexes for your languages as you develop them, right alongside C and FORTRAN. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex A C Language Development Utilities Option 855 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX A.1 c89 - Compile Standard C programs A.1.1 Synopsis c89 [-c] [-D _n_a_m_e[=_v_a_l_u_e]] ... [-E] [-g] [-I _d_i_r_e_c_t_o_r_y] ... [-L _d_i_r_e_c_t_o_r_y] ... [-o _o_u_t_f_i_l_e] [-O] [-s] [-U _n_a_m_e] ... _o_p_e_r_a_n_d ... A.1.2 Description The c89 utility is the interface to the standard C compilation system; it shall accept source code conforming to the C Standard {7}. The system conceptually consists of a compiler and link editor. The files referenced by _o_p_e_r_a_n_ds shall be compiled and linked to produce an executable file. (It is unspecified whether the linking occurs entirely within the operation of c89; some systems may produce objects that are not fully resolved until the file is executed.) If the -c option is specified, for all pathname operands of the form _f_i_l_e.c, the files $(basename _p_a_t_h_n_a_m_e ._c)._o shall be created as the result of successful compilation. If the -c option is not specified, it is unspecified whether such .o files are created or deleted for the _f_i_l_e.c operands. If there are no options that prevent link editing (such as -c or -E), and all operands compile and link without error, the resulting executable file shall be written according to the -o _o_u_t_f_i_l_e option (if present) or to the file a.out. The executable file shall be created as specified in 2.9.1.4, except that the file permissions shall be set to S_IRWXO | S_IRWXG | S_IRWXU (see 5.6.1.2 in POSIX.1 {8}) and that the bits specified by the _u_m_a_s_k of the process shall be cleared. A.1.3 Options The c89 utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 856 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 - The -l _l_i_b_r_a_r_y operands have the format of options, but their position within a list of operands affects the order in which libraries are searched. - The order of specifying the -I and -L options is significant. 1 - Conforming applications shall specify each option separately; that is, grouping option letters (e.g., -cO) need not be recognized by all implementations. The following options shall be supported by the implementation: -c Suppress the link-edit phase of the compilation, and do not remove any object files that are produced. -g Produce symbolic information in the object or executable files; the nature of this information is unspecified, and may be modified by implementation-defined interactions with other options. -s Produce object and/or executable files from which symbolic and other information not required for proper execution using _e_x_e_c (see POSIX.1 {8} 3.1.2) has been removed (stripped). If both -g and -s options are present, the action taken is unspecified. -o _o_u_t_f_i_l_e Use the pathname _o_u_t_f_i_l_e, instead of the default a.out, for the executable file produced. If the -o option is present with -c or -E, the result is unspecified. -D _n_a_m_e[=_v_a_l_u_e] Define _n_a_m_e as if by a C-language #define directive. If no =_v_a_l_u_e is given, a value of 1 shall be used. The -D option has lower precedence than the -U option. That is, if _n_a_m_e is used in both a -U and a -D option, _n_a_m_e shall be undefined regardless of the order of the options. Additional implementation-defined _n_a_m_e_s may be provided by the compiler. Implementations shall support at least 2048 bytes of -D definitions and 256 _n_a_m_e_s. -E Copy C-language source files to the standard output, expanding all preprocessor directives; no compilation shall be performed. If any operand is not a text file, the effects are unspecified. -I _d_i_r_e_c_t_o_r_y Change the algorithm for searching for headers whose names are not absolute pathnames to look in the directory named by the _d_i_r_e_c_t_o_r_y pathname before looking in the usual Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.1 c89 - Compile Standard C programs 857 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX places. Thus, headers whose names are enclosed in double-quotes ("") shall be searched for first in the directory of the file with the #include line, then in directories named in -I options, and last in the usual places. For headers whose names are enclosed in angle brackets (<>), the header shall be searched for only in directories named in -I options and then in the usual places. Directories named in -I options shall be searched in the order specified. Implementations shall support at least ten instances of this option in a single c89 command invocation. -L _d_i_r_e_c_t_o_r_y Change the algorithm of searching for the libraries named in the -l objects to look in the directory named by the _d_i_r_e_c_t_o_r_y pathname before looking in the usual places. Directories named in -L options shall be searched in the order specified. Implementations shall support at least ten instances of this option in a single c89 command invocation. If a directory specified by a -L option contains files named libc.a, libm.a, libl.a, or liby.a, the results are unspecified. -O Optimize. The nature of the optimization is unspecified. -U _n_a_m_e Remove any initial definition of _n_a_m_e. Multiple instances of the -D, -I, -U, and -L options can be specified. A.1.4 Operands An _o_p_e_r_a_n_d is either in the form of a pathname or the form -l _l_i_b_r_a_r_y. At least one operand of the pathname form shall be specified. The following operands shall be supported by the implementation: _f_i_l_e._c A C-language source file to be compiled and optionally linked. The operand shall be of this form if the -c option is used. _f_i_l_e._a A library of object files typically produced by ar (see 6.1), and passed directly to the link editor. Implementations may recognize implementation-defined suffixes other than .a as denoting object file libraries. _f_i_l_e._o An object file produced by c89 -c, and passed directly to the link editor. Implementations may recognize implementation-defined suffixes other than .o as denoting object files. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 858 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The processing of other files is implementation defined. -l _l_i_b_r_a_r_y (The letter ell.) Search the library named: lib_l_i_b_r_a_r_y._a A library shall be searched when its name is encountered, so the placement of a -l operand is significant. Several standard libraries can be specified in this manner, as described in A.1.7. Implementations may recognize implementation-defined suffixes other than .a as denoting libraries. A.1.5 External Influences A.1.5.1 Standard Input None. A.1.5.2 Input Files The input file shall be one of the following: a text file containing a C-language source program; an object file in the format produced by c89 -c; or a library of object files, in the format produced by archiving zero or more object files, using ar. Implementations may supply additional utilities that produce files in these formats. Additional input file formats are implementation defined. A.1.5.3 Environment Variables The following environment variables shall affect the execution of c89: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.1 c89 - Compile Standard C programs 859 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_MESSAGES This variable shall determine the language in which messages should be written. TMPDIR This variable shall be interpreted as a pathname that should override the default directory for temporary files, if any. A.1.5.4 Asynchronous Events Default. A.1.6 External Effects A.1.6.1 Standard Output If more than one file operand ending in .c (or possibly other unspecified suffixes) is given, for each such file: "%s:\n", <_f_i_l_e> may be written. These messages, if written, shall precede the processing of each input file; they shall not be written to standard output if they are written to standard error, as described in A.1.6.2. If the -E option is specified, the standard output shall be a text file 1 that represents the results of the preprocessing stage of the language; 1 it may contain extra information appropriate for subsequent compilation 1 passes. 1 A.1.6.2 Standard Error Used only for diagnostic messages. If more than one file operand ending in .c (or possibly other unspecified suffixes) is given, for each such file: "%s:\n", <_f_i_l_e> may be written to allow identification of the diagnostic and warning messages with the appropriate input file. These messages, if written, shall precede the processing of each input file; they shall not be written to the standard error if they are written to the standard output, as described in A.1.6.1. This utility may produce warning messages about certain conditions that do not warrant returning an error (nonzero) exit value. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 860 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A.1.6.3 Output Files Object files or executable files or both are produced in unspecified formats. A.1.7 Extended Description A.1.7.1 Standard Libraries The c89 utility shall recognize the following -l operands for standard libraries: -l c This library contains all library functions referenced in , , , , , , , , and , except for those functions referenced in . If an invocation of getconf _POSIX_VERSION exits with a status of zero, the library searched also shall include all functions defined by POSIX.1 {8}; if the status is nonzero, it is unspecified whether these functions are available. If an invocation of getconf _POSIX2_C_BIND exits with a status of zero, the library searched also shall include all functions specified in Annex B; if the status is nonzero, it is unspecified whether these functions are available. An implementation shall not require this operand to be present to cause a search of this library. -l m This library contains all functions referenced in . An implementation may search this library in the absence of this operand. -l l This library contains all functions required by the C- language output of lex (see A.2) that are not made available through the -l c operand. -l y This library contains all functions required by the C- language output of yacc (see A.3) that are not made available through the -l c operand. In the absence of options that inhibit invocation of the link editor, such as -c or -E, the c89 utility shall cause the equivalent of a -l c operand to be passed to the link editor as the last -l operand, causing Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.1 c89 - Compile Standard C programs 861 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX it to be searched after all other object files and libraries are loaded. It is unspecified whether the libraries libc.a, libm.a, libl.a, and liby.a exist as regular files. The implementation may accept as -l operands names of objects that do not exist as regular files. A.1.7.2 External Symbols The C compiler and link editor shall support the significance of external 1 symbols up to a length of at least 31 bytes; the action taken upon 1 encountering symbols exceeding the implementation-defined maximum symbol length is unspecified. The compiler and link editor shall support a minimum of 511 external symbols per source or object file, and a minimum of 4095 external symbols total. A diagnostic message shall be written to the standard output if the implementation-defined limit is exceeded; other actions are unspecified. A.1.8 Exit Status The c89 utility shall exit with one of the following values: 0 Successful compilation or link edit. >0 An error occurred. A.1.9 Consequences of Errors When c89 encounters a compilation error that causes an object file not to be created, it shall write a diagnostic to standard error and continue to compile other source code operands, but it shall not perform the link phase and shall return a nonzero exit status. If the link edit is unsuccessful, a diagnostic message shall be written to standard error and c89 shall exit with a nonzero status. BEGIN_RATIONALE A.1.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Note that some implementations support a finer-grained model of compilation than the one described above. In this model, the following conceptual phases may exist: preprocessor, compiler, optimizer, assembler, link editor. Such implementations may support these Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 862 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 additional options to the c89 utility: -P Preprocess, but do not compile, the named C programs and leave the result on corresponding files suffixed .i. -S Compile the named C programs into assembly language, and leave the assembler-language output on corresponding files suffixed .s. No object files are created. [-W_c,_a_r_g_1[,_a_r_g_2 ...]] Hand off the argument(s) _a_r_g_i to phase _c where _c is one of [p02al] indicating preprocessor, compiler, optimizer, assembler, or link editor, respectively. For example, -Wa,-m passes -m to the assembler phase. (Note the rationale concerning -W in 2.10.1.1.) The -fpq options have been excluded, since they use features that are not in this standard. In specifying that _f_i_l_e.a operands are _t_y_p_i_c_a_l_l_y produced by ar, it is the intention of POSIX.2 to require that object libraries produced by ar be usable by c89, but not to preclude an implementation from supplying another utility that creates object library files. The following are examples of usage: c89 -o foo foo.c Compiles foo.c and creates the executable foo. c89 -c foo.c Compiles foo.c and creates the object file foo.o. c89 foo.c Compiles foo.c and creates the executable a.out. c89 foo.c bar.o Compiles foo.c, links it with bar.o, and creates the executable a.out. Also creates and leaves foo.o. The following examples clarify the use and interactions of -L options and -l operands: Consider the case in which module a.c calls function _f() in library libQ.a, and module b.c calls function _g() in library libp.a. Assume that both libraries reside in /a/b/c. The command line to compile and link in the desired way is: c89 -L /a/b/c main.o a.c -l Q b.c -l p In this case the -l Q operand need only precede the first -l p operand, since both libQ.a and libp.a reside in the same directory. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.1 c89 - Compile Standard C programs 863 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Multiple -L operands can be used when library name collisions occur. Building on the previous example, suppose that we now want to use a new libp.a, in /a/a/a, but we still want _f() from /a/b/c/libQ.a. c89 -L /a/a/a -L /a/b/c main.o a.c -l Q b.c -l p In this example, the linker searches the -L options in the order specified, and finds /a/a/a/libp.a before /a/b/c/libp.a when resolving references for b.c. The order of the -l operands is still important, however. There is the possible implication that if a user supplies versions of the standard library functions (before they would be encountered by an implicit -l c or explicit -l m), that those versions would be used in place of the standard versions. There are various reasons this might not be true (functions defined as macros, manipulations for clean namespace, etc.), so the existence of files named in the same manner as the standard libraries within the -L directories is explicitly stated to produce unspecified behavior. Some historical implementations have permitted -L options to be interspersed with -l operands on the command line; with respect to POSIX, such behavior would be considered a vendor extension. For an application to compile consistently on systems that do not behave like this, it is necessary for a conforming application to supply all -L options before any of the -l options. Some historical implementations have created .o files when -c is not specified and more than one source file is given. Since this area is left unspecified, the application cannot rely on .o files being created, but it also must be prepared for any related .o files that already exist being deleted at the completion of the link edit. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The name of this utility differs from the historical cc name. The C Standard {7} document was approved during the development of POSIX.2, and it is clear that POSIX must support Standard C; there is no other good way of specifying a C language. The support of the C Standard {7} by c89 also mandates the Standard C math libraries. An alternative approach was considered: provide an option to select the type of compilation required. However, it was found that all available option letters were already in use in the various historical cc utilities. Thus, this name change is being used essentially as a switch. There was some temptation to use the name change as an excuse to mandate a cleaner interface (e.g., conform to the utility syntax guidelines), but this was resisted; the majority of early c89 implementations are expected to be satisfied with historical ccs with only minimal changes. This was Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 864 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 decided more from the standpoint of existing applications and makefiles than for the implementors' sake. The -l _l_i_b_r_a_r_y operand must be capable of being interspersed with file name operands so that the order in which libraries are searched by the link editor can be specified. The search algorithm for -I _d_i_r_e_c_t_o_r_y states that the directory of the file with the #include file is searched first, rather than being implementation defined. It is believed that this reflects most implementations, and it disallows variations on different implementations, since this would make it very difficult to distribute source code in a compatible form. The -I options are searched in the order specified (which is left to right in English). This resolves the conflict of what header file is used if multiple files with the same name exist in different directories in the include path. In a future extension or supplement to this standard, _s_h_o_u_l_d will be changed to _s_h_a_l_l with respect to support for TMPDIR by applications. It is unclear whether c89 requires such a large number of file descriptors that its requirement should be documented here; POSIX.2 remains silent on the issue. It is also noted that an undocumented feature of some C compilers is that if file descriptor 9 is open, a linkage trace is written to it. There is no pseudo-_p_r_i_n_t_f() specification for compile errors because no common format could be identified. As new C compilers are written, they are encouraged to use the following format: "%s: %s: %d %s\n", <_c_o_m_p_i_l_e_r _p_h_a_s_e>, <_f_i_l_e _n_a_m_e>, <_l_i_n_e _n_u_m_b_e_r>, <_e_x_p_l_a_n_a_t_i_o_n> The following option proposals were considered and rejected: (1) The -M option in BSD does not exist in System V, and is not seen to enhance application portability. (2) The -S option was not seen to enhance application portability, and makes assumptions about the underlying architecture. Earlier drafts included a -v option to select a compiler version. Not only did this letter (and every other upper- and lowercase letter) collide with one historical implementation or another, but there was no agreement on how many compiler versions should be defined, or what they should mean. Another choice is to specify that the cc utility invoke a Standard C compiler. By specifying c89 instead, an installation is able Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.1 c89 - Compile Standard C programs 865 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX to link either a ``common usage'' or a Standard C compiler to the name cc. Implementors are free to select implementation-defined options to select (nonportable) extensions to their existing C compiler to aid the transition to Standard C. The -g and -s options are not specified as mutually exclusive. Historically these two options have been mutually exclusive, but because both are so loosely specified, it seemed cleaner to leave their interaction unspecified. The -E option was added because headers are not required to be separate files in a POSIX.1-conformant system; these values could be hard-coded into the compiler, or might only be accessible in a nonportable way. Hence, while not strictly required for application portability, this option is a practical necessity as a portable means for ascertaining the real effects of preprocessor statements. In BSD systems, using -c and -o in the same command causes the object module to be stored in the specified file. In System V, this produces an error condition. Therefore, POSIX.2 indicates that this is an unspecified condition. Reasonably precise specification of standard library access is required. Implementations are not required to have /usr/lib/libc.a, etc., as many historical implementations do, but if not they are required to recognize c, m, l, and y as tokens. Libraries l and y can be empty if the library functions specified for lex and yacc are accessible through the -l c operand. Historically, these libraries have been necessary, but they are not required for a conforming implementation. External symbol size limits are in a normative subclause; portable applications need to know these limits. However, the minimum maximum symbol length should be taken as a constraint on a portable application, not on an implementation, and consequently the action taken for a symbol exceeding the limit is unspecified. The minimum size for the external symbol table was added for similar reasons. The Consequences of Errors subclause clearly specifies the compiler's behavior when compilation or link-edit error occur. The behavior of several historical implementations was examined, and the choice was made to be silent on the status of the executable, or a.out, file in the face of compiler or linker errors. If a linker writes the executable file, then links it on disk with _l_s_e_e_k()s and _w_r_i_t_e()s, the partially-linked executable can be left on disk and its execute bits turned off if the link edit fails. However, if the linker links the image in memory before writing the file to disk, it need not touch the executable file (if it already exists) because the link edit fails. Since both approaches are existing practice, a portable application shall rely on the exit status of c89, rather than on the existence or mode of the executable file. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 866 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The requirement that portable applications specify compiler options separately is to reserve the multicharacter option namespace for vendor- specific compiler options, which are known to exist in many historical implementations. Implementations are not required to recognize, for example -gc as if it were -g -c; nor are they forbidden from doing so. The synopsis shows all of the options separately to highlight this requirement on applications. Echoing filenames to standard error is considered a diagnostic message, because it might otherwise be difficult to associate an error message with the erring file. The text specifies either standard error or standard output for these messages because some historical practice uses standard output, but there was considerable sentiment expressed for allowing it to be on standard error instead. The rationale for using standard output is that these are not really error message headers, but a running progress report on which files have been processed. The messages are described as optional because there might be different ways of constructing the compiler's messages that should not be precluded. END_RATIONALE A.2 lex - Generate programs for lexical tasks A.2.1 Synopsis lex [-t] [ -n | -v ] [_f_i_l_e ...] _O_b_s_o_l_e_s_c_e_n_t _V_e_r_s_i_o_n: lex -c [-t] [ -n | -v ] [_f_i_l_e ...] A.2.2 Description The lex utility shall generate C programs to be used in lexical processing of character input, and that can be used as an interface to yacc (see A.3). The C programs shall be generated from lex source code and conform to the C Standard {7}. Usually, the lex utility writes the program it generates to the file lex.yy.c; the state of this file is unspecified if lex exits with a nonzero exit status. See A.2.7 for a complete description of the lex input language. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 867 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX A.2.3 Options The lex utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -c (Obsolescent.) Indicate C-language action (default option). -n Suppress the summary of statistics usually written with the -v option. If no table sizes are specified in the lex source code and the -v option is not specified, then -n is implied. -t Write the resulting program to standard output instead of lex.yy.c. -v Write a summary of lex statistics to the standard output. (See the discussion of lex table sizes in A.2.7.1.) If the -t option is specified and -n is not specified, this report shall be written to standard error. If table sizes are specified in the lex source code, and if the -n option is not specified, the -v option may be enabled. A.2.4 Operands The following operand shall be supported by the implementation: _f_i_l_e A pathname of an input file. If more than one such _f_i_l_e is specified, all files shall be concatenated to produce a single lex program. If no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -, the standard input shall be used. A.2.5 External Influences A.2.5.1 Standard Input The standard input shall be used if no _f_i_l_e operands are specified, or if a _f_i_l_e operand is -. See Input Files. A.2.5.2 Input Files The input files shall be text files containing lex source code, as described in A.2.7. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 868 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A.2.5.3 Environment Variables The following environment variables shall affect the execution of lex: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE This variable shall determine the locale for the behavior of ranges, equivalence classes, and multicharacter collating elements within regular expressions. If this variable is not set to the POSIX Locale, the results are unspecified. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files) and the behavior of character classes within extended regular expressions. If this variable is not set to the POSIX Locale, the results are unspecified. LC_MESSAGES This variable shall determine the language in which messages should be written. A.2.5.4 Asynchronous Events Default. A.2.6 External Effects A.2.6.1 Standard Output If the -t option is specified, the text file of C source code output of lex shall be written to standard output. If the -t option is not specified: (1) Implementation-defined informational, error, and warning messages concerning the contents of lex source code input shall be written to either the standard output or standard error. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 869 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (2) If the -v option is specified and the -n option is not specified, lex statistics shall also be written to either the standard output or standard error, in an implementation-defined format. These statistics may also be generated if table sizes are specified with a % operator in the _D_e_f_i_n_i_t_i_o_n_s section (see A.2.7), as long as the -n option is not specified. A.2.6.2 Standard Error If the -t option is specified, implementation-defined informational, error, and warning messages concerning the contents of lex source code input shall be written to the standard error. If the -t option is not specified: (1) Implementation-defined informational, error, and warning messages concerning the contents of lex source code input shall be written to either the standard output or standard error. (2) If the -v option is specified and the -n option is not specified, lex statistics shall also be written to either the standard output or standard error, in an implementation-defined format. These statistics may also be generated if table sizes are specified with a % operator in the _D_e_f_i_n_i_t_i_o_n_s section (see A.2.7), as long as the -n option is not specified. A.2.6.3 Output Files A text file containing C source code shall be written to lex.yy.c, or to the standard output if the -t option is present. A.2.7 Extended Description Each input file contains lex source code, which is a table of regular expressions with corresponding actions in the form of C program fragments. When lex.yy.c is compiled and linked with the lex library (using the -l l operand with c89), the resulting program reads character input from the standard input and partitions it into strings that match the given expressions. When an expression is matched, these actions shall occur: - The input string that was matched is left in _y_y_t_e_x_t as a null- terminated string; _y_y_t_e_x_t is either an external character array or a pointer to a character string. As explained in A.2.7.1, the type can be explicitly selected using the %array or %pointer Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 870 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 declarations, but the default is implementation defined. - The external _i_n_t _y_y_l_e_n_g is set to the length of the matching string. - The expression's corresponding program fragment, or action, is executed. During pattern matching, lex shall search the set of patterns for the 1 single longest possible match. Among rules that match the same number of 1 characters, the rule given first shall be chosen. The general format of lex source is: _D_e_f_i_n_i_t_i_o_n_s %% _R_u_l_e_s %% _U_s_e_r _S_u_b_r_o_u_t_i_n_e_s The first %% is required to mark the beginning of the rules (regular expressions and actions); the second %% is required only if user subroutines follow. Any line in the _D_e_f_i_n_i_t_i_o_n_s section beginning with a shall be assumed to be a C program fragment and shall be copied to the external definition area of the lex.yy.c file. Similarly, anything in the _D_e_f_i_n_i_t_i_o_n_s section included between delimiter lines containing only %{ and %} shall also be copied unchanged to the external definition area of the lex.yy.c file. Any such input (beginning with a or within %{ and %} delimiter lines) appearing at the beginning of the _R_u_l_e_s section before any rules are specified shall be written to lex.yy.c after the declarations of variables for the _y_y_l_e_x() function and before the first line of code in _y_y_l_e_x(). Thus, user variables local to _y_y_l_e_x() can be declared here, as well as application code to execute upon entry to _y_y_l_e_x(). The action taken by lex when encountering any input beginning with a or within %{ and %} delimiter lines appearing in the _R_u_l_e_s section but coming after one or more rules is undefined. The presence of such input may result in an erroneous definition of the _y_y_l_e_x() function. _A._2._7._1 lex _D_e_f_i_n_i_t_i_o_n_s _D_e_f_i_n_i_t_i_o_n_s appear before the first %% delimiter. Any line in this section not contained between %{ and %} lines and not beginning with a shall be assumed to define a lex substitution string. The format of these lines is: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 871 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _n_a_m_e _s_u_b_s_t_i_t_u_t_e If a _n_a_m_e does not meet the requirements for identifiers in the C Standard {7}, the result is undefined. The string _s_u_b_s_t_i_t_u_t_e shall replace the string {_n_a_m_e} when it is used in a rule. The _n_a_m_e string shall be recognized in this context only when the braces are provided and when it does not appear within a bracket expression or within double- quotes. In the _D_e_f_i_n_i_t_i_o_n_s section, any line beginning with a % (percent-sign) character and followed by an alphanumeric word beginning with either s or S shall define a set of start conditions. Any line beginning with a % followed by a word beginning with either x or X shall define a set of exclusive start conditions. When the generated scanner is in a %s state, patterns with no state specified shall be also active; in a %x state, such patterns shall not be active. The rest of the line, after the first word, shall be considered to be one or more -_s_e_p_a_r_a_t_e_d names of start conditions. Start condition names shall be constructed in the same way as definition names. Start conditions can be used to restrict the matching of regular expressions to one or more states as described in the section A.2.7.4. Implementations shall accept either of the following two mutually exclusive declarations in the _D_e_f_i_n_i_t_i_o_n_s section: %array Declare the type of _y_y_t_e_x_t to be a null-terminated character array. %pointer Declare the type of _y_y_t_e_x_t to be a pointer to a null- terminated character string. The default type of _y_y_t_e_x_t is implementation defined. If an application refers to _y_y_t_e_x_t outside of the scanner source file (i.e., via an extern), the application shall include the appropriate %array or %pointer declaration in the scanner source file. Implementations shall accept declarations in the _D_e_f_i_n_i_t_i_o_n_s section for setting certain internal table sizes. The declarations are shown in Table A-1. In the table, _n represents a positive decimal integer, preceded by one or more s. The exact meaning of these table size numbers is implementation defined. The implementation shall document how these numbers affect the lex utility and how they are related to any output that may be generated by the implementation should space limitations be encountered during the execution of lex. It shall be possible to determine from this output which of the table size values needs to be modified to permit lex to successfully generate tables for the input language. The values in the column Minimum Value represent the lowest values conforming implementations shall provide. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 872 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table A-1 - lex Table Size Declarations __________________________________________________________________________________________________________________________________________________ Minimum Declaration Description Value ______________________________________________________ %p _n Number of positions 2500 %n _n Number of states 500 %a _n Number of transitions 2000 %e _n Number of parse tree nodes 1000 %k _n Number of packed character 1000 classes %o _n Size of the output array 3000 __________________________________________________________________________________________________________________________________________________ A.2.7.2 lex Rules The rules in lex source files are a table in which the left column contains regular expressions and the right column contains actions (C program fragments) to be executed when the expressions are recognized. _E_R_E _a_c_t_i_o_n _E_R_E _a_c_t_i_o_n ... The extended regular expression (_E_R_E) portion of a rule shall be separated from _a_c_t_i_o_n by one or more _s. A regular expression containing _s shall be recognized under the following conditions: the entire expression appears within double-quotes; or, the _s appear within double-quotes or square brackets; or, each is preceded by a backslash character. _A._2._7._3 lex _U_s_e_r _S_u_b_r_o_u_t_i_n_e_s Anything in the user subroutines section shall be copied to lex.yy.c 1 following _y_y_l_e_x(). 1 _A._2._7._4 lex _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s The lex utility shall support the set of extended regular expressions (see 2.8.4), with the following additions and exceptions to the syntax: "..." Any string enclosed in double-quotes shall represent the 1 characters within the double-quotes as themselves, except 1 that backslash escapes (which appear in Table A-2) shall 1 be recognized. Any backslash-escape sequence shall be 1 terminated by the closing quote. For example, "\01""1" 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 873 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX represents a single string: the octal value 1 followed by 1 the character 1. 1 <_s_t_a_t_e>_r 1 <_s_t_a_t_e_1,_s_t_a_t_e_2,...>_r 1 The regular expression _r shall be matched only when the 1 program is in one of the start conditions indicated by 1 _s_t_a_t_e, _s_t_a_t_e_1, etc.; see A.2.7.5. (As an exception to the 1 typographical conventions of the rest of this standard, in this case <_s_t_a_t_e> does not represent a metavariable, but the literal angle-bracket characters surrounding a symbol.) The start condition shall be recognized as such 1 only at the beginning of a regular expression. 1 _r/_x The regular expression _r shall be matched only if it is followed by an occurrence of regular expression _x. The token returned in _y_y_t_e_x_t shall only match _r. If the trailing portion of _r matches the beginning of _x, the result is unspecified. The _r expression cannot include further trailing context or the $ (match-end-of-line) operator; _x cannot include the ^ (match-beginning-of-line) operator, nor trailing context, nor the $ operator. That is, only one occurrence of trailing context is allowed in a lex regular expression, and the ^ operator only can be used at the beginning of such an expression. {_n_a_m_e} When _n_a_m_e is one of the substitution symbols from the _D_e_f_i_n_i_t_i_o_n_s section (see A.2.7.1), the string, including the enclosing braces, shall be replaced by the _s_u_b_s_t_i_t_u_t_e value. The _s_u_b_s_t_i_t_u_t_e value shall be treated in the extended regular expression as if it were enclosed in parentheses. No substitution shall occur if {_n_a_m_e} occurs within a bracket expression or within double-quotes. Within an ERE, a backslash character shall be considered to begin an escape sequence as specified in Table 2-15 (see 2.12). In addition, the escape sequences in Table A-2 shall be recognized. A literal character cannot occur within an ERE; the escape 1 sequence \n can be used to represent a . A shall not 2 be matched by a period operator. 2 The order of precedence given to extended regular expressions for lex 2 differs from that specified in Table 2-13. The order of precedence for lex shall be as shown in Table A-3, from high to low. NOTE: The escaped characters entry is not meant to imply that these are 2 operators, but they are included in the table to show their relationships 2 to the true operators. The start condition, trailing context, and 2 anchoring notations have been omitted from the table because of the 2 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 874 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table A-2 - lex Escape Sequences __________________________________________________________________________________________________________________________________________________ Escape Sequence Description Meaning _________________________________________________________________________ \_d_i_g_i_t_s followed by The character whose 111 the longest sequence of encoding is represented by 11 one, two, or three the one-, two-, or three- 11 octal-digit characters digit octal integer. If 11 (01234567). If all of the size of a byte on the 11 the digits are 0, system is greater than nine 11 (i.e., representation bits, the valid escape 11 of the NUL character), sequence used to represent 11 the behavior is a byte is implementation- 11 undefined. defined. Multibyte 11 characters require 1 multiple, concatenated 1 escape sequences of this 1 type, including the leading 1 \ for each byte. 1 \x_d_i_g_i_t_s followed by The character whose 111 the longest sequence of encoding is represented by 11 hexadecimal-digit the hexadecimal integer. 11 characters 1 (01234567abcdefABCDEF). 1 If all of the digits 1 are 0, (i.e., 1 representation of the 1 NUL character), the 1 behavior is undefined. 1 \_c followed by The character _c, unchanged. any character not described in this table or in Table 2-15 __________________________________________________________________________________________________________________________________________________ Table A-3 - lex ERE Precedence __________________________________________________________________________________________________________________________________________________ 2 _c_o_l_l_a_t_i_o_n-_r_e_l_a_t_e_d _b_r_a_c_k_e_t _s_y_m_b_o_l_s [= =] [: :] [. .] _e_s_c_a_p_e_d _c_h_a_r_a_c_t_e_r_s \<_s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r> 1 _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n [ ] 1 _q_u_o_t_i_n_g "..." 1 _g_r_o_u_p_i_n_g ( ) 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 875 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _d_e_f_i_n_i_t_i_o_n {_n_a_m_e} 1 _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r _R_E _d_u_p_l_i_c_a_t_i_o_n * + ? 1 _c_o_n_c_a_t_e_n_a_t_i_o_n 1 _i_n_t_e_r_v_a_l _e_x_p_r_e_s_s_i_o_n {_m,_n} 2 _a_l_t_e_r_n_a_t_i_o_n | 2 __________________________________________________________________________________________________________________________________________________ placement restrictions described in this subclause; they can only appear 2 at the beginning or ending of an ERE. 2 The ERE anchoring operators (^ and $) do not appear in Table A-3. With 2 lex regular expressions, these operators are restricted in their use: 2 the ^ operator can only be used at the beginning of an entire regular 2 expression, and the $ operator only at the end. The operators apply to 2 the entire regular expression. Thus, for example, the pattern 2 (^abc)|(def$) is undefined; it can instead be written as two separate 2 rules, one with the regular expression ^abc and one with def$, which 2 share a common action via the special | action (see below). If the 2 pattern were written ^abc|def$, it would match either of abc or def on a 2 line by itself. Note also that $ is a form of trailing context (it is 2 equivalent to /\n) and as such cannot be used with regular expressions 2 containing another instance of the operator (see the preceding discussion 2 of trailing context). 2 The additional regular expressions trailing-context operator / can be 1 used as an ordinary character if presented within double-quotes, "/"; 1 preceded by a backslash, \/; or within a bracket expression, [/]. The 1 start-condition < and > operators shall be special only in a start 1 condition at the beginning of a regular expression; elsewhere in the 1 regular expression they shall be treated as ordinary characters. 1 A.2.7.5 lex Actions The action to be taken when an _E_R_E is matched can be a C program fragment or the special actions described below; the program fragment can contain one or more C statements, and can also include special actions. The empty C statement ; shall be a valid action; any string in the lex.yy.c input that matches the pattern portion of such a rule is effectively ignored or skipped. However, the absence of an action shall not be valid, and the action lex takes in such a condition is undefined. The specification for an action, including C statements and/or special actions, can extend across several lines if enclosed in braces: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 876 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E_R_E <_b_l_a_n_k(_s)> { _p_r_o_g_r_a_m _s_t_a_t_e_m_e_n_t _p_r_o_g_r_a_m _s_t_a_t_e_m_e_n_t } The default action when a string in the input to a lex.yy.c program is not matched by any expression shall be to copy the string to the output. Because the default behavior of a program generated by lex is to read the input and copy it to the output, a minimal lex source program that has just %% shall generate a C program that simply copies the input to the output unchanged. Four special actions shall be available: ``|'', ``ECHO;'', ``REJECT;'', 1 and ``BEGIN'': 1 | The action | means that the action for the next rule is the action for this rule. Unlike the other three actions, | cannot be enclosed in braces or be semicolon-terminated; it shall be specified alone, with no other actions. ECHO; Write the contents of the string _y_y_t_e_x_t on the output. 1 REJECT; Usually only a single expression is matched by a given 1 string in the input. REJECT means ``continue to the next expression that matches the current input,'' and causes whatever rule was the second choice after the current rule to be executed for the same input. Thus, multiple rules can be matched and executed for one input string or overlapping input strings. For example, given the regular expressions xyz and xy and the input xyz, usually only the regular expression xyz would match. The next attempted match would start after z. If the last action in the xyz rule is REJECT, both this rule and the xy rule would be executed. The REJECT action may be implemented in such a fashion that flow of control does not continue after it, as if it were equivalent to a goto to another part of _y_y_l_e_x(). The use of REJECT may result in somewhat larger and slower scanners. BEGIN The BEGIN _n_e_w_s_t_a_t_e; action switches the state (start condition) to _n_e_w_s_t_a_t_e. If the string _n_e_w_s_t_a_t_e has not been declared previously as a start condition in the _D_e_f_i_n_i_t_i_o_n_s section, the results are unspecified. The initial state is indicated by the digit 0 or the token INITIAL. The functions or macros described below are accessible to user code included in the lex input. It is unspecified whether they appear in the Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 877 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C code output of lex, or are accessible only through the -l l operand to c89 (the lex library). int yylex(void) Performs lexical analysis on the input; this is the primary function generated by the lex utility. The function shall return zero when the end of input is reached; otherwise it shall return nonzero values (tokens) determined by the actions that are selected. int yymore(void) When called, indicates that when the next input string is recognized, it is to be appended to the current value of _y_y_t_e_x_t rather than replacing it; the value in _y_y_l_e_n_g shall be adjusted accordingly. int yyless(int _n) Retains _n initial characters in _y_y_t_e_x_t, NUL- terminated, and treats the remaining characters as if they had not been read; the value in _y_y_l_e_n_g shall be adjusted accordingly. int input(void) Returns the next character from the input, or zero on end of file. It shall obtain input from the stream pointer _y_y_i_n, although possibly via an intermediate buffer. Thus, once scanning has begun, the effect of altering the value of _y_y_i_n is undefined. The character read is removed from the input stream of the scanner without any processing by the scanner. int unput(int _c) Returns the character _c to the input; _y_y_t_e_x_t and _y_y_l_e_n_g are undefined until the next expression is matched. The result of _u_n_p_u_tting more characters than have been input is unspecified. The following functions appear only in the lex library accessible through the -l l operand; they can therefore be redefined by a portable application: int yywrap(void) Called by _y_y_l_e_x() at end of file; the default _y_y_w_r_a_p() always shall return 1. If the application requires _y_y_l_e_x() to continue processing with another source of input, then the application can include a function _y_y_w_r_a_p(), which associates another file with the external variable FILE *_y_y_i_n and shall return a value of zero. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 878 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 int main(int argc, char *argv[]) Calls _y_y_l_e_x() to perform lexical analysis, then exits. The user code can contain _m_a_i_n() to perform application-specific operations, calling _y_y_l_e_x() as applicable. Except for _i_n_p_u_t(), _u_n_p_u_t(), and _m_a_i_n(), all external and static names generated by lex shall begin with the prefix yy or YY. A.2.8 Exit Status The lex utility shall exit with one of the following values: 0 Successful completion. >0 An error occurred. A.2.9 Consequences of Errors Default. BEGIN_RATIONALE A.2.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following is an example of a lex program that implements a rudimentary scanner for a Pascal-like syntax: %{ /* need this for the call to atof() below */ #include /* need this for printf(), fopen(), and stdin below */ #include %} DIGIT [0-9] ID [a-z][a-z0-9]* %% {DIGIT}+ { printf("An integer: %s (%d)\n", yytext, atoi(yytext)); } Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 879 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX {DIGIT}+"."{DIGIT}* { printf("A float: %s (%g)\n", yytext, atof(yytext)); } if|then|begin|end|procedure|function { printf("A keyword: %s\n", yytext); } {ID} printf("An identifier: %s\n", yytext); "+"|"-"|"*"|"/" printf("An operator: %s\n", yytext); "{"[^}\n]*"}" /* eat up one-line comments */ [ \t\n]+ /* eat up white space */ . printf("Unrecognized character: %s\n", yytext); %% int main(int argc, char *argv[]) { ++argv, --argc; /* skip over program name */ if (argc > 0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex(); } The following examples have been included to clarify the differences between lex regular expressions and regular expressions appearing elsewhere in this document. For regular expressions of the form _r/_x, the string matching _r is always returned; confusion may arise when the beginning of _x matches the trailing portion of _r. For example, given the regular expression a*b/cc and the input aaabcc, _y_y_t_e_x_t would contain the string aaab on this match. But given the regular expression x*/xy and the input xxxy, the token xxx, not xx, is returned by some implementations because xxx matches x*. In the rule ab*/bc, the b* at the end of _r will extend _r's match into the beginning of the trailing context, so the result is unspecified. If this rule were ab/bc, however, the rule matches the text ab when it is followed by the text bc. In this latter case, the matching of _r cannot extend into the beginning of _x, so the result is specified. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 880 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Unlike the general ERE rules, embedded anchoring is not allowed by most 2 historical lex implementations. An example of embedded anchoring would 2 be for patterns such as (^| )foo( |$) to match foo when it exists as a 2 complete word. This functionality can be obtained using existing lex 2 features: 2 ^foo/[ \n] | 2 " foo"/[ \n] /* found foo as a separate word */ 2 The precedence of regular expressions in lex does not match that of extended regular expressions in Section 2 because of historical practice. In System V lex and its predecessors, a regular expression of the form ab{3} matches ababab; an ERE, such as used by egrep, would match abbb. Changing this precedence for uniformity with egrep would have been desirable, but too many applications would break in nonobvious ways. Conforming applications are warned that in the _R_u_l_e_s section, an _E_R_E without an action is not acceptable, but need not be detected as erroneous by lex. This may result in compilation or run-time errors. The purpose of _i_n_p_u_t() is to take characters off the input stream and discard them as far as the lexical analysis is concerned. A common use is to discard the body of a comment once the beginning of a comment is recognized. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Even though the -c option and references to the C language are retained in this description, lex may be generalized to other languages, as was done at one time for EFL, Extended FORTRAN Language. Since the lex input specification is essentially language independent, versions of this utility could be written to produce Ada, Modula-2, or Pascal code, and there are known historical implementations that do so. The current description of lex bypasses the issue of dealing with internationalized regular expressions in the lex source code or generated lexical analyzer. If it follows the model used by awk, (the source code is assumed to be presented in the POSIX Locale, but input and output are in the locale specified by the environment variables), then the tables in the lexical analyzer produced by lex would interpret regular expressions specified in the lex source in terms of the environment variables specified when lex was executed. The desired effect would be to have the lexical analyzer interpret the regular expressions given in the lex source according to the environment specified when the lexical analyzer is executed, but this is not possible with the current lex technology. Major international vendors believe that only limited internationalization is required for the POSIX.2 lex. The theoretically desirable goal of runtime-selectable locales is not feasible in the near Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 881 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX future. Furthermore, the very nature of the lexical analyzers produced by lex must be closely tied to the lexical requirements of the input language being described, which will frequently be locale-specific anyway. (For example, writing an analyzer that is used for French text will not automatically be useful for processing other languages.) The text in the Environment Variable subclause allows locale-specific regular expression handling, but mandates only something similar to that provided in historical implementations. The description of octal- and hexadecimal-digit escape sequences agrees 1 with the C Standard {7} use of escape sequences. See the rationale for 1 ed for a discussion of bytes larger than nine bits being represented by 1 octal values. Hexadecimal values can represent larger bytes and 1 multibyte characters directly, using as many digits as required. 1 There is no detailed output format specification. The observed behavior of lex under four different historical implementations was that none of these implementations consistently reported the line numbers for error and warning messages. Furthermore, there was a desire that lex be allowed to output additional diagnostic messages. Leaving message formats unspecified sidesteps these formatting questions and also avoids problems with internationalization. Although the %x specifier for exclusive start conditions is not existing practice, it is believed to be a minor change to historical implementations, and greatly enhances the usability of lex programs since it permits an application to obtain the expected functionality with fewer statements. The %array and %pointer declarations were added as a compromise between historical systems. The System V-based lex has copied the matched text to a _y_y_t_e_x_t array. The flex program, supported in BSD and GNU systems, uses a pointer. In the latter case, significant performance improvements are available for some scanners. Most existing programs should require no change in porting from one system to another because the string being referenced is null-terminated in both cases. (The method used by flex in its case is to null-terminate the token in-place by remembering the character that used to come right after the token and replacing it before continuing on to the next scan.) Multifile programs with external references to _y_y_t_e_x_t outside the scanner source file should continue to operate on their existing systems, but would require one of the new declarations to be considered strictly portable. The description of regular expressions avoids unnecessary duplication of regular expression details. Specifically, the | operator and {_m,_n} interval expression are not listed in A.2.7.4 because their meanings within a lex regular expression are the same as that for extended regular expressions. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 882 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The reason for the undefined condition associated with text beginning with a or within %{ and %} delimiter lines appearing in the _R_u_l_e_s section is historical practice. Both BSD and System V lex copy the indented (or enclosed) input in the _R_u_l_e_s section (except at the beginning) to unreachable areas of the _y_y_l_e_x() function (the code is written directly after a break statement). In some cases, the System V lex generates an error message or a syntax error, depending on the form of indented input. The intention in breaking the list of functions into those that may appear in lex.yy.c versus those that only appear in libl.a is that only those functions in libl.a can be reliably redefined by a portable application. The descriptions of Standard Output and Standard Error are somewhat complicated because historical lex implementations chose to issue diagnostic messages to standard output (unless -t was given). POSIX.2 allows this behavior, but leaves an opening for the more expected behavior of using standard error for diagnostics. Also, the System V behavior of writing the statistics when any table sizes are given is allowed, while BSD-derived systems can avoid it. The programmer can always precisely obtain the desired results by using either the -t or -n options. The Operands subclause does not mention the use of - as a synonym for standard input; not all historical implementations support such usage for any of the _f_i_l_e operands. The description of the _T_r_a_n_s_l_a_t_i_o_n _T_a_b_l_e was deleted from earlier drafts because of its relatively low usage in historical applications. The change to the definition of the _i_n_p_u_t() function that allows buffering of input presents the opportunity for major performance gains in some applications. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.2 lex - Generate programs for lexical tasks 883 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX A.3 yacc - Yet another compiler compiler A.3.1 Synopsis yacc [-dltv] [-b _f_i_l_e__p_r_e_f_i_x] [-p _s_y_m__p_r_e_f_i_x] _g_r_a_m_m_a_r A.3.2 Description The yacc utility shall read a description of a context-free grammar in _f_i_l_e and write C source code, conforming to the C Standard {7}, to a code file, and optionally header information into a header file, in the current directory. The C code shall define a function and related routines and macros for an automaton that executes a parsing algorithm meeting the requirements in A.3.7.8. The form and meaning of the grammar is described in A.3.7. The C source code and header file shall be produced in a form suitable as input for the C compiler (see c89 in A.1). A.3.3 Options The yacc utility shall conform to the utility argument syntax guidelines described in 2.10.2. The following options shall be supported by the implementation: -b _f_i_l_e__p_r_e_f_i_x Use _f_i_l_e__p_r_e_f_i_x instead of y as the prefix for all output filenames. The code file y.tab.c, the header file y.tab.h (created when -d is specified), and the description file y.output (created when -v is specified), shall be changed to _f_i_l_e__p_r_e_f_i_x.tab.c, _f_i_l_e__p_r_e_f_i_x.tab.h, and _f_i_l_e__p_r_e_f_i_x.output, respectively. -d Write the header file; by default only the code file is written. -l Produce a code file that does not contain any #line constructs. If this option is not present, it is unspecified whether the code file or header file contains #line directives. -p _s_y_m__p_r_e_f_i_x Use _s_y_m__p_r_e_f_i_x instead of yy as the prefix for all 2 external names produced by yacc. The names affected shall 2 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 884 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 include the functions _y_y_p_a_r_s_e(), _y_y_l_e_x(), and _y_y_e_r_r_o_r(), and the variables _y_y_l_v_a_l, _y_y_c_h_a_r, and _y_y_d_e_b_u_g. (In the remainder of this clause, the six symbols cited are referenced using their default names only as a notational convenience.) Local names may also be affected by the -p 2 option; however, the -p option shall not affect yacc- 2 generated #define symbols. 2 -t Modify conditional compilation directives to permit compilation of debugging code in the code file. Runtime debugging statements shall be always contained in the code file, but by default conditional compilation directives prevent their compilation. -v Write a file containing a description of the parser and a report of conflicts generated by ambiguities in the grammar. A.3.4 Operands The following operand is required: _g_r_a_m_m_a_r A pathname of a file containing instructions, hereafter called _g_r_a_m_m_a_r, for which a parser is to be created. The format for the grammar is described in A.3.7. A.3.5 External Influences A.3.5.1 Standard Input None. A.3.5.2 Input Files The file _g_r_a_m_m_a_r shall be a text file formatted as specified in A.3.7. A.3.5.3 Environment Variables The following environment variables shall affect the execution of yacc: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 885 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. The LANG and LC_* variables shall affect the execution of the yacc utility as stated. The _m_a_i_n() function defined in A.3.7.6 shall call setlocale(LC_ALL, "") and thus, the program generated by yacc shall also be affected by the the contents of these variables at runtime. A.3.5.4 Asynchronous Events Default. A.3.6 External Effects A.3.6.1 Standard Output None. A.3.6.2 Standard Error If shift/reduce or reduce/reduce conflicts are detected in _g_r_a_m_m_a_r, yacc writes a report of those conflicts to the standard error in an unspecified format. Standard error is also used for diagnostic messages. A.3.6.3 Output Files The code file, the header file, and the description file shall be text files. All are described in the following subclauses. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 886 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A.3.6.3.1 Code file This file shall contain the C source code for the _y_y_p_a_r_s_e() routine. It shall contain code for the various semantic actions with macro substitution performed on them as described in A.3.7. It shall also 2 contain a copy of the #define statements in the header file. If a %union 2 declaration is used, the declaration for YYSTYPE shall be also included 2 in this file. 2 The contents of the Program Section (see A.3.7.1.4) of the input file shall then be included. A.3.6.3.2 Header file The header file shall contain #define statements that associate the token numbers with the token names. This allows source files other than the code file to access the token codes. If a %union declaration is used, the declaration for YYSTYPE and an extern YYSTYPE yylval declaration shall be also included in this file. A.3.6.3.3 Description file The description file shall be a text file containing a description of the state machine corresponding to the parser, using an unspecified format. 2 Limits for internal tables (see A.3.7.9) also shall be reported, in an 2 implementation-defined manner. 2 A.3.7 Extended Description The yacc command accepts a language that is used to define a grammar for a target language to be parsed by the tables and code generated by yacc. The language accepted by yacc as a grammar for the target language is described below using the yacc input language itself. The input _g_r_a_m_m_a_r includes rules describing the input structure of the target language, and code to be invoked when these rules are recognized to provide the associated semantic action. The code to be executed shall appear as bodies of text that are intended to be C language code. The C language inclusions are presumed to form a correct function when processed by yacc into its output files. The code included in this way shall be executed during the recognition of the target language. Given a grammar, the yacc utility generates the files described in 2 A.3.6.3. The code file can be compiled and linked using c89. If the 2 declaration and programs sections of the grammar file did not include 2 definitions of _m_a_i_n(), _y_y_l_e_x(), and _y_y_e_r_r_o_r(), the compiled output 2 requires linking with externally supplied version of those functions. 2 Default versions of _m_a_i_n() and _y_y_e_r_r_o_r() are supplied in the yacc library 2 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 887 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX and can be linked in by using the -l y operand to c89. The yacc library 1 interfaces need not support interfaces with other than the default yy 1 symbol prefix. The application provides the lexical analyzer function, 1 _y_y_l_e_x(); the lex utility (see A.2) is specifically designed to generate such a routine. 2 A.3.7.1 Input Language Every specification file shall consist of three sections: _d_e_c_l_a_r_a_t_i_o_n_s, _g_r_a_m_m_a_r _r_u_l_e_s, and _p_r_o_g_r_a_m_s, separated by double percent-signs (%%). The declarations and programs sections can be empty. If the latter is empty, the preceding %% mark separating it from the rules section can be omitted. The input is free form text following the structure of the grammar defined below. A.3.7.1.1 Lexical Structure of the Grammar The characters s, s, and s shall be ignored, except that they shall not appear in names or multicharacter reserved symbols. Comments shall be enclosed in /* ... */, and can appear wherever a name is valid. Names are of arbitrary length, made up of letters, periods (.), underscores (_), and noninitial digits. Upper- and lowercase letters are distinct. Portable applications shall not use names beginning in yy or YY since the yacc parser uses such names. Many of the names appear in the final output of yacc, and thus they should be chosen to conform with any additional rules created by the C compiler to be used. In particular they will appear in #define statements. A literal shall consist of a single character enclosed in single-quotes ('). All of the escape sequences supported for character constants by the C Standard {7} (3.1.3.4) shall be supported by yacc. The relationship with the lexical analyzer is discussed in detail below. The NUL character shall not be used in grammar rules or literals. A.3.7.1.2 Declarations Section The declarations section is used to define the symbols used to define the target language and their relationship with each other. In particular, much of the additional information required to resolve ambiguities in the context-free grammar for the target language is provided here. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 888 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Usually yacc assigns the relationship between the symbolic names it generates and their underlying numeric value. The declarations section makes it possible to control the assignment of these values. It is also possible to keep semantic information associated with the tokens currently on the parse stack in a user-defined C language union, if the members of the union are associated with the various names in the grammar. The declarations section provides for this as well. The first group of declarators below all take a list of names as arguments. That list can optionally be preceded by the name of a C union member (called a _t_a_g below) appearing within ``<'' and ``>''. (As an exception to the typographical conventions of the rest of this standard, in this case <_t_a_g> does not represent a metavariable, but the literal angle bracket characters surrounding a symbol.) The use of _t_a_g specifies that the tokens named on this line are to be of the same C type as the union member referenced by _t_a_g. This is discussed in more detail below. For lists used to define tokens, the first appearance of a given token can be followed by a positive integer (as a string of decimal digits). If this is done, the underlying value assigned to it for lexical purposes shall be taken to be that number. %token [<_t_a_g>] _n_a_m_e [_n_u_m_b_e_r] [_n_a_m_e [_n_u_m_b_e_r]]... Declares _n_a_m_e(s) to be a token. If _t_a_g is present, the C type for all tokens on this line shall be declared to be the type referenced by _t_a_g. If a positive integer, _n_u_m_b_e_r, follows a _n_a_m_e, that value shall be assigned to the token. %left [<_t_a_g>] _n_a_m_e [_n_u_m_b_e_r] [_n_a_m_e [_n_u_m_b_e_r]]... %right [<_t_a_g>] _n_a_m_e [_n_u_m_b_e_r] [_n_a_m_e [_n_u_m_b_e_r]]... Declares _n_a_m_e to be a token, and assigns precedence to it. One or more lines, each beginning with one of these symbols can appear in this section. All tokens on the same line have the same precedence level and associativity; the lines are in order of increasing precedence or binding strength. %left denotes that the operators on that line are left associative, and %right similarly denotes right associative operators. If _t_a_g is present, it shall declare a C type for _n_a_m_e(s) as described for %token. %nonassoc [<_t_a_g>] _n_a_m_e [_n_u_m_b_e_r] [_n_a_m_e [_n_u_m_b_e_r]]... Declares _n_a_m_e to be a token, and indicates that this cannot be used associatively. If the parser encounters associative use of this token it shall report an error. If _t_a_g is present, it shall declare a C type for _n_a_m_e(s) as described for %token. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 889 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX %type <_t_a_g> _n_a_m_e... Declares that union member _n_a_m_e(s) are nonterminals, and thus it is required to have a _t_a_g field at its beginning. Because it deals with nonterminals only, assigning a token number or using a literal is also prohibited. If this construct is present, yacc shall perform type checking; if this construct is not present, the parse stack shall hold only the int type. Every name used in _g_r_a_m_m_a_r undefined by a %token, %left, %right, or %nonassoc declaration is assumed to represent a nonterminal symbol. The yacc utility shall report an error for any nonterminal symbol that does not appear on the left side of at least one grammar rule. Once the type, precedence, or token number of a name is specified, it shall not be changed. If the first declaration of a token does not assign a token number, yacc shall assign a token number. Once this assignment is made, the token number shall not be changed by explicit assignment. The following declarators do not follow the previous pattern. %start _n_a_m_e Declares the nonterminal _n_a_m_e to be the _s_t_a_r_t _s_y_m_b_o_l, which represents the largest, most general structure described by the grammar rules. By default, it is the left-hand side of the first grammar rule; this default can be overridden with this declaration. %union { _b_o_d_y _o_f _u_n_i_o_n (_i_n _C) } Declares the yacc value stack to be a union of the various types of values desired. By default, the values returned by actions (see below) and the lexical analyzer shall be integers. The yacc utility keeps track of types, and shall insert corresponding union member names in order to perform strict type checking of the resulting parser. Alternatively, given that at least one <_t_a_g> construct is used, the union can be declared in a header file (which shall be included in the declarations section by using an #include construct within %{ and %}), and a typedef used to define the symbol YYSTYPE to represent this union. The effect of %union is to provide the declaration of YYSTYPE directly from the input. %{ ... %} C language declarations and definitions can appear in the declarations section, enclosed by these marks. These statements shall be copied into the code file, and have global scope within it so that they can be used in the Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 890 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 rules and program sections. The declarations section shall be terminated by the token %%. A.3.7.1.3 Grammar Rules The rules section defines the context-free grammar to be accepted by the function yacc generates, and associates with those rules C language actions and additional precedence information. The grammar is described below, and a formal definition follows. The rules section is comprised of one or more grammar rules. A grammar rule has the form: A : BODY ; The symbol A represents a nonterminal name, and BODY represents a sequence of zero or more _n_a_m_es, _l_i_t_e_r_a_ls, and _s_e_m_a_n_t_i_c _a_c_t_i_o_ns that can then be followed by optional _p_r_e_c_e_d_e_n_c_e _r_u_l_es. Only the names and literals participate in the formation of the grammar; the semantic actions and precedence rules are used in other ways. The colon and the semicolon are yacc punctuation. If there are several successive grammar rules with the same left-hand side, the vertical bar | can be used to avoid rewriting the left-hand side; in this case the semicolon appears only after the last rule. The BODY part can be empty (or empty of names and literals) to indicate that the nonterminal symbol matches the empty string. The yacc utility assigns a unique number to each rule. Rules using the vertical bar notation are distinct rules. The number assigned to the rule appears in the description file. The elements comprising a BODY are: _n_a_m_e _l_i_t_e_r_a_l These form the rules of the grammar: _n_a_m_e is either a _t_o_k_e_n or a _n_o_n_t_e_r_m_i_n_a_l; _l_i_t_e_r_a_l stands for itself (less the lexically required quotation marks). _s_e_m_a_n_t_i_c _a_c_t_i_o_n With each grammar rule, the user can associate actions to be performed each time the rule is recognized in the input process. [Note that the word ``action'' can also refer to the actions of the parser (shift, reduce, etc.).] These actions can return values and can obtain the values returned by previous actions. These values shall be kept in objects of type YYSTYPE (see %union). The result value of the action shall be kept on the parse stack with the Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 891 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX left-hand side of the rule, to be accessed by other reductions as part of their right-hand side. By using the <_t_a_g> information provided in the declarations section, the code generated by yacc can be strictly type checked and contain arbitrary information. In addition, the lexical analyzer can provide the same kinds of values for tokens, if desired. An action is an arbitrary C statement, and as such can do input or output, call subprograms, and alter external variables. An action is one or more C statements enclosed in curly braces { and }. Certain pseudo-variables can be used in the action. These are macros for access to data structures known interally to yacc. $$ The value of the action can be set by assigning it to $$. If type checking is enabled and the type of the value to be assigned cannot be determined, a diagnostic message may be generated. $_n_u_m_b_e_r This refers to the value returned by the component specified by the token _n_u_m_b_e_r in the right side of a rule, reading from left to right; _n_u_m_b_e_r can be zero or negative. If it is, it refers to the data associated with the name on the parser's stack preceding the leftmost symbol of the current rule. (That is, $0 refers to the name immediately preceding the leftmost name in the current rule, to be found on the parser's stack, and $-1 refers to the symbol to _i_t_s left.) If _n_u_m_b_e_r refers to an element past the current point in the rule, or beyond the bottom of the stack, the result is undefined. If type checking is enabled and the type of the value to be assigned cannot be determined, a diagnostic message may be generated. $<_t_a_g>_n_u_m_b_e_r These correspond exactly to the corresponding symbols without the _t_a_g inclusion, but allow for strict type checking (and preclude unwanted type conversions). The effect is that the macro is expanded to use _t_a_g to select an element from the YYSTYPE union (using _d_a_t_a_n_a_m_e._t_a_g). This is particularly useful if _n_u_m_b_e_r is not positive. 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 892 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 $<_t_a_g>$ This imposes on the reference the type of the union member referenced by _t_a_g. This construction is applicable when a reference to a left context value occurs in the grammar, and provides yacc with a means for selecting a type. Actions can occur in the middle of a rule as well as at the end; an action can access values returned by actions to its left, and in turn the value it returns can be accessed by actions to its right. An action appearing in the middle of a rule shall be equivalent to replacing the action with a new nonterminal symbol and adding an empty rule with that nonterminal symbol on the left-hand side. The semantic action associated with the new rule shall be equivalent to the original action. The use of actions within rules might introduce conflicts that would not otherwise exist. By default, the value of a rule shall be the value of the first element in it. If the first element does not have a type (particularly in the case of a literal) and type checking is turned on by %type an error message shall result. _p_r_e_c_e_d_e_n_c_e The keyword %prec can be used to change the precedence 1 level associated with a particular grammar rule. Examples 1 of this are in cases where a unary and binary operator 1 have the same symbolic representation, but need to be 1 given different precedences, or where the handling of an 1 ambiguous if-else construction is necessary. The reserved 1 symbol %prec can appear immediately after the body of the 1 grammar rule and can be followed by a token name or a literal. It shall cause the precedence of the grammar rule to become that of the following token name or literal. The action for the rule as a whole can follow %prec. If a program section follows, the grammar rules shall be terminated by 1 %%. 1 A.3.7.1.4 Programs Section The _p_r_o_g_r_a_m_s section can include the definition of the lexical analyzer _y_y_l_e_x(), and any other functions, for example those used in the actions specified in the grammar rules. This is C language code, and shall be included in the code file after the tables and code generated by yacc. It is unspecified whether the programs section precedes or follows the semantic actions in the output file; therefore, if the application Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 893 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX contains any macro definitions and declarations intended to apply to the code in the semantic actions, it shall place them within %{ ... %} in the declarations section. A.3.7.1.5 Input Grammar The following input to yacc yields a parser for the input to yacc. This is to be taken as the formal specification of the grammar of yacc, notwithstanding conflicts that may appear elsewhere. The lexical structure is defined less precisely; the previous section on A.3.7.1.1 defines most terms. The correspondence between the previous terms and the tokens below is as follows. IDENTIFIER This corresponds to the concept of _n_a_m_e, given previously. It also includes literals as defined previously. C_IDENTIFIER This is a name, and additionally it is known to be followed by a colon. A literal cannot yield this token. NUMBER A string of digits (a nonnegative decimal integer). TYPE LEFT MARK etc. These correspond directly to %type, %left, %%, etc. { ... } This indicates C language source code, with the possible inclusion of $ macros as discussed previously. /* Grammar for the input to yacc */ /* Basic entries */ /* The following are recognized by the lexical analyzer */ %token IDENTIFIER /* includes identifiers and literals */ %token C_IDENTIFIER /* identifier (but not literal) followed by a : */ %token NUMBER /* [0-9][0-9]* */ /* Reserved words : %type=>TYPE %left=>LEFT, etc. */ %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 894 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 %token MARK /* the %% mark */ %token LCURL /* the %{ mark */ %token RCURL /* the }% mark */ /* 8-bit character literals stand for themselves; */ /* tokens have to be defined for multibyte characters */ %start spec %% spec : defs MARK rules tail ; tail : MARK { /* In this action, set up the rest of the file */ } | /* empty; the second MARK is optional */ ; defs : /* empty */ | defs def ; def : START IDENTIFIER | UNION { /* Copy union definition to output */ } | LCURL { /* Copy C code to output file */ } RCURL | rword tag nlist ; rword : TOKEN | LEFT | RIGHT | NONASSOC | TYPE ; tag : /* empty: union tag id optional */ | '<' IDENTIFIER '>' Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 895 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ; nlist : nmno | nlist nmno ; nmno : IDENTIFIER /* Note: literal invalid with % type */ | IDENTIFIER NUMBER /* Note: invalid with % type */ ; /* rule section */ rules : C_IDENTIFIER rbody prec | rules rule ; rule : C_IDENTIFIER rbody prec | '|' rbody prec ; rbody : /* empty */ | rbody IDENTIFIER | rbody act ; act : '{' { /* Copy action, translate $$, etc. */ } '}' ; prec : /* empty */ | PREC IDENTIFIER | PREC IDENTIFIER act | prec ';' ; A.3.7.2 Conflicts The parser produced for an input grammar may contain states in which conflicts occur. The conflicts occur because the grammar is not LALR(1). An ambiguous grammar always contains at least one LALR(1) conflict. The yacc utility shall resolve all conflicts, using either default rules or user-specified precedence rules. Conflicts are either ``shift/reduce conflicts'' or ``reduce/reduce conflicts.'' A shift/reduce conflict is where, for a given state and lookahead symbol, both a shift action and a reduce action are possible. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 896 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A reduce/reduce conflict is where, for a given state and lookahead symbol, reductions by two different rules are possible. The rules below describe how to specify what actions to take when a conflict occurs. Not all shift/reduce conflicts can be successfully resolved this way because the conflict may be due to something other than ambiguity, so incautious use of these facilities can cause the language accepted by the parser to be much different than was intended. The description file shall contain sufficient information to understand the cause of the conflict. Where ambiguity is the reason either the default or explicit rules should be adequate to produce a working parser. The declared precedences and associativities (see A.3.7.1.2) are used to resolve parsing conflicts as follows: (1) A precedence and associativity is associated with each grammar rule; it is the precedence and associativity of the last token or literal in the body of the rule. If the %prec keyword is used, it overrides this default. Some grammar rules might not have both precedence and associativity. (2) If there is a shift/reduce conflict, and both the grammar rule and the input symbol have precedence and associativity associated with them, then the conflict is resolved in favor of the action (shift or reduce) associated with the higher precedence. If the precedences are the same, then the associativity is used; left associative implies reduce, right associative implies shift, and nonassociative implies an error in the string being parsed. (3) When there is a shift/reduce conflict that cannot be resolved by rule (2), the shift is done. Conflicts resolved this way are counted in the diagnostic output described in A.3.7.3. (4) When there is a reduce/reduce conflict, a reduction is done by the grammar rule that occurs earlier in the input sequence. Conflicts resolved this way are counted in the diagnostic output described in A.3.7.3. Conflicts resolved by precedence or associativity shall not be counted in the shift/reduce and reduce/reduce conflicts reported by yacc on either standard error or in the description file. A.3.7.3 Error Handling The token error shall be reserved for error handling. The name error can be used in grammar rules. It indicates places where the parser can recover from a syntax error. The default value of error shall be 256. Its value can be changed using a %token declaration. The lexical Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 897 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX analyzer should not return the value of error. The parser shall detect a syntax error when it is in a state where the action associated with the lookahead symbol is error. A semantic action can cause the parser to initiate error handling by executing the macro YYERROR. When YYERROR is executed, the semantic action shall pass control back to the parser. YYERROR cannot be used outside of semantic actions. When the parser detects a syntax error, it normally calls yyerror with the character string "syntax error" as its argument. The call shall not be made if the parser is still recovering from a previous error when the error is detected. The parser is considered to be recovering from a previous error until the parser has shifted over at least three normal input symbols since the last error was detected or a semantic action has executed the macro yyerrok. The parser shall not call yyerror when YYERROR is executed. The macro function YYRECOVERING() shall return 1 if a syntax error has been detected and the parser has not yet fully recovered from it. Otherwise, zero shall be returned. When a syntax error is detected by the parser, the parser shall check if a previous syntax error has been detected. If a previous error was detected, and if no normal input symbols have been shifted since the preceding error was detected, the parser checks if the lookahead symbol is an endmarker (see A.3.7.4). If it is, the parser shall return with a nonzero value. Otherwise, the lookahead symbol shall be discarded and normal parsing shall resume. When YYERROR is executed or when the parser detects a syntax error and no previous error has been detected, or at least one normal input symbol has been shifted since the previous error was detected, the parser shall pop back one state at a time until the parse stack is empty or the current state allows a shift over error. If the parser empties the parse stack, it shall return with a nonzero value. Otherwise, it shall shift over error and then resume normal parsing. If the parser reads a lookahead symbol before the error was detected, that symbol shall still be the lookahead symbol when parsing is resumed. The macro yyerrok in a semantic action shall cause the parser to act as if it has fully recovered from any previous errors. The macro yyclearin shall cause the parser to discard the current lookahead token. If the current lookahead token has not yet been read, yyclearin shall have no effect. The macro YYACCEPT shall cause the parser to return with the value zero. The macro YYABORT shall cause the parser to return with a nonzero value. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 898 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A.3.7.4 Interface to the Lexical Analyzer The _y_y_l_e_x() function is an integer-valued function that returns a _t_o_k_e_n _n_u_m_b_e_r representing the kind of token read. If there is a value associated with the token returned by _y_y_l_e_x() (see the discussion of _t_a_g above), it shall be assigned to the external variable _y_y_l_v_a_l. If the parser and _y_y_l_e_x() do not agree on these token numbers, reliable communication between them cannot occur. For (one character) literals, the token is simply the numeric value of the character in the current character set. The numbers for other tokens can either be chosen by yacc, or chosen by the user. In either case, the #define construct of C is used to allow _y_y_l_e_x() to return these numbers symbolically. The #define statements are put into the code file, and the header file if that file is requested. The set of characters permitted by yacc in an identifier is larger than that permitted by C. Token names found to contain such characters shall not be included in the #define declarations. If the token numbers are chosen by yacc, the tokens other than literals shall be assigned numbers greater than 256, although no order is implied. 1 A token can be explicitly assigned a number by following its first appearance in the declarations section with a number. Names and literals not defined this way retain their default definition. All assigned token numbers shall be unique and distinct from the token numbers used for literals. If duplicate token numbers cause conflicts in parser generation, yacc shall report an error; otherwise, it is unspecified whether the token assignment is accepted or an error is reported. The end of the input is marked by a special token called the _e_n_d_m_a_r_k_e_r, which has a token number that is zero or negative. (These values are invalid for any other token.) All lexical analyzers shall return zero or negative as a token number upon reaching the end of their input. If the tokens up to, but excluding, the endmarker form a structure that matches the start symbol, the parser shall accept the input. If the endmarker is seen in any other context, it shall be considered an error. A.3.7.5 Completing the Program In addition to _y_y_p_a_r_s_e() and _y_y_l_e_x(), the functions _y_y_e_r_r_o_r() and _m_a_i_n() are required to make a complete program. The application can supply _m_a_i_n() and _y_y_e_r_r_o_r(), or those routines can be obtained from the yacc library. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 899 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _A._3._7._6 yacc _L_i_b_r_a_r_y The following functions appear only in the yacc library accessible through the -l y operand to c89; they can therefore be redefined by a portable application: int main(void) 1 This function shall call _y_y_p_a_r_s_e() and exit with an unspecified value. Other actions within this function are unspecified. int yyerror(const char *_s) 1 This function shall write the NUL-terminated argument to standard error, followed by a . The order of the -l y and -l l operands given to c89 is significant; the application shall either provide its own _m_a_i_n() function or ensure that -l y precedes -l l. A.3.7.7 Debugging the Parser The parser generated by yacc shall have diagnostic facilities in it that can be optionally enabled at either compile time or at run time (if enabled at compile time). The compilation of the runtime debugging code is under the control of YYDEBUG, a preprocessor symbol. If YYDEBUG has a nonzero value, the debugging code shall be included. If its value is zero, the code shall not be included. In parsers where the debugging code has been included, the external int yydebug can be used to turn debugging on (with a nonzero value) and off (zero value) at run time. The initial value of _y_y_d_e_b_u_g shall be zero. When -t is specified, the code file shall be built such that, if YYDEBUG is not already defined at compilation time (using the c89 -D YYDEBUG option, for example), YYDEBUG shall be set explicitly to 1. When -t is not specified, the code file shall be built such that, if YYDEBUG is not already defined, it shall be set explicitly to zero. The format of the debugging output is unspecified but includes at least enough information to determine the shift and reduce actions, and the input symbols. It also provides information about error recovery. A.3.7.8 Algorithms The parser constructed by yacc implements an LALR(1) parsing algorithm as documented in the literature. It is unspecified whether the parser is table-driven or direct-coded. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 900 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 A parser generated by yacc shall never request an input symbol from _y_y_l_e_x() while in a state where the only actions other than the error action are reductions by a single rule. The literature of parsing theory defines these concepts. A.3.7.9 Limits Table A-4 - yacc Internal Limits __________________________________________________________________________________________________________________________________________________ Minimum Limit Maximum Description _________________________________________________________________________ {NTERMS} 126 Number of tokens. {NNONTERM} 200 Number of nonterminals. {NPROD} 300 Number of rules. {NSTATES} 600 Number of states. {MEMSIZE} 5200 Length of rules. The total length, in names (tokens and nonterminals), of all the rules of the grammar. The left-hand side is counted for each rule, even if it is not explicitly repeated, as specified in A.3.7.1.3. {ACTSIZE} 4000 Number of actions. ``Actions'' here (and in the description file) refer to parser actions (shift, reduce, etc.) not to semantic actions defined in A.3.7.1.3. __________________________________________________________________________________________________________________________________________________ The yacc utility may have several internal tables. The minimum maximums for these tables are shown in Table A-4. The exact meaning of these values is implementation defined. The implementation shall define the relationship between these values and between them and any error messages that the implementation may generate should it run out of space for any internal structure. An implementation may combine groups of these resources into a single pool as long as the total available to the user does not fall below the sum of the sizes specified by this subclause. A.3.8 Exit Status The yacc utility shall exit with one of the following values: 0 Successful completion. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 901 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX >0 An error occurred. A.3.9 Consequences of Errors If any errors are encountered, the run is aborted and yacc exits with a nonzero status. Partial code files and header files files may be produced. The summary information in the description file shall always be produced if the -v flag is present. BEGIN_RATIONALE A.3.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The references in the Bibliography may be helpful in constructing the parser generator. The Pennello-DeRemer {B26} paper (along with the works 2 it references) describe a technique to generate parsers that conform to 2 this standard. Work in this area continues to be done, so implementors should consult current literature before doing any new implementations. 1 The original paper by Knuth {B27} is the theoretical basis for this kind of parser, but the tables it generates are impractically large for reasonable grammars, and should not be used. The ``equivalent to'' wording is intentional to assure that the best tables that are LALR(1) can be generated. There has been confusion between the class of grammars, the algorithms needed to generate parsers, and the algorithms needed to parse the languages. They are all reasonably orthogonal. In particular, a parser generator that accepts the full range of LR(1) grammars need not generate a table any more complex than one that accepts SLR(1) (a relatively weak class of LR grammars) for a grammar that happens to be SLR(1). Such an implementation need not recognize the case, either; table compression can yield the SLR(1) table (or one even smaller than that) without recognizing that the grammar is SLR(1). The speed of a LR(1) parser for any class is dependent more upon the table representation and compression (or the code generation if a direct parser is generated) than upon the class of grammar that the table generator handles. The speed of the parser generator is somewhat dependent upon the class of grammar it handles. However, the original Knuth {B27} algorithms for 2 constructing LR parsers was judged by its author to be impractically slow 2 at that time. Although full LR is more complex than LALR(1), as computer speeds and algorithms improve, the difference (in terms of acceptable 2 wall-clock execution time) is becoming less significant. 2 Potential authors are cautioned that the Penello-DeRemer paper previously 2 cited identifies a bug (an oversimplification of the computation of 2 LALR(1) lookahead sets) in some of the LALR(1) algorithm statements that 2 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 902 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 preceded it to publication. They should take the time to seek out that 2 paper, as well as current relevant work, particularly Aho's {B22}. _E_x_a_m_p_l_e_s_,__U_s_a_g_e Access to the yacc library is obtained with library search operands to c89. To use the yacc library _m_a_i_n(), c89 y.tab.c -l y Both the lex library and the yacc library contain _m_a_i_n(). To access the yacc _m_a_i_n(), c89 y.tab.c lex.yy.c -l y -l l This ensures that the yacc library is searched first, so that its _m_a_i_n() is used. The historical yacc libraries have contained two simple functions that are normally coded by the application programmer. These library functions are similar to the following code: #include 1 int main(void) 1 { extern int yyparse(); setlocale(LC_ALL, ""); /* If the following parser is one created by lex, the application must be careful to ensure that LC_CTYPE and LC_COLLATE are set to the POSIX Locale. */ (void) yyparse(); return (0); } #include int yyerror(const char *msg) 1 { (void) fprintf(stderr, "%s\n", msg); return (0); } Historical implementations experience name conflicts on the names yacc.tmp, yacc.acts, yacc.debug, y.tab.c, y.tab.h, and y.output if more than one copy of yacc is running in a single directory at one time. The -b option was added to overcome this problem. The related problem of allowing multiple yacc parsers to be placed in the same file was Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 903 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX addressed by adding a -p option to override the previously hardcoded yy variable prefix. (The -p option name was selected from a historical implementation.) Implementations will also have to be cognizant of 2.11.6.3, which requires that any temporary files used by yacc also be named to avoid collisions. The description of the -p option specifies the minimal set of function and variable names that cause conflict when multiple parsers are linked together. YYSTYPE does not need to be changed. Instead, the programmer can use -b to give the header files for different parsers different names, and then the file with the _y_y_l_e_x() for a given parser can include the header for that parser. Names such as _y_y_c_l_e_a_r_e_r_r don't need to be changed because they are used only in the actions; they do not have linkage. It is possible that an implementation will have other names, either internal ones for implementing things such as _y_y_c_l_e_a_r_e_r_r, or providing nonstandard features, that it wants to change with -p. The -b option was added to provide a portable method for permitting yacc to work on multiple separate parsers in the same directory. If a directory contains more than one yacc grammar, and both grammars are constructed at the same time (by, say, a parallel make program), conflict results. While the solution is not historical practice, it corrects a known deficiency in historical implementations. Corresponding changes were made to all sections that referenced the filenames y.tab.c (now ``the code file''), y.tab.h (now ``the header file''), and y.output (now ``the description file''). The grammar for yacc input is based on System V documentation. The textual description shows there that the ; is required at the end of the 1 rule. The grammar and the implementation do not require this. (The use of C_IDENTIFIER causes a reduce to occur in the right place.) Also, in that implementation, the constructs such as %token can be 1 terminated by a semicolon, but this is not permitted by the grammar. The keywords such as %token can also appear in uppercase, which is again not discussed. In most places where % is used, \ can be substituted, and there are alternate spellings for some of the symbols (e.g. %LEFT can be %< or even \<). Multibyte characters should be recognized by the lexical analyzer and 2 returned as tokens. They should not be returned as multibyte character 2 literals. The token error that is used for error recovery is normally 2 assigned the value 256 in the historical implementation. Thus, the token 2 value 256, which used in many multibyte character sets, is not available 2 for use as the value of a user-defined token. 2 Historically, <_t_a_g> can contain any characters except >, including white space, in the implementation. However, since the _t_a_g must reference a Standard C union member, in practice conforming implementations need only Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 904 A C Language Development Utilities Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 support the set of characters for Standard C identifiers in this context. Some historical implementations are known to accept actions that are terminated by a period. Historical implementations often allow $ in names. A conforming implementation need support neither of these behaviors. Unary operators that are the same token as a binary operator in general need their precedence adjusted. This is handled by the %prec advisory symbol associated with the particular grammar rule defining that unary operator. See A. Applications are not required to use this operator for unary operators, but the grammars that do not require it are rare. Deciding when to use %prec illustrates the difficulty in specifying the behavior of yacc. There may be situations in which the _g_r_a_m_m_a_r is not strictly speaking in error, and yet yacc cannot interpret it unambiguously. The resolution of ambiguities in the grammar can in many instances be resolved by providing additional information, such as using %type or %union declarations. It is often easier and it usually yields a smaller parser to take this alternative when it is appropriate. The size and execution time of a program produced without the runtime debugging code is usually smaller and slightly faster in historical implementations. There is a fair amount of material in this that appears tutorial in nature; some of it has been moved to the Rationale in Draft 9 to simplify the specification. It is hard to avoid because of the need to define terms at least informally. The alternative is to bring in one of the parser generator texts and use its terminology directly, but since there is some variation in that terminology, it was felt that informal definitions of the terms so that someone who understood the concepts would be sure to understand the terms would make the standard stand alone from any specific text. Statistics messages from several historical implementations include the following types of information: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. A.3 yacc - Yet another compiler compiler 905 P1003.2/D11.2 _n/512 terminals, _n/300 nonterminals _n/600 grammar rules, _n/1500 states _n shift/reduce, _n reduce/reduce conflicts reported _n/350 working sets used memory: states,etc. _n/15000, parser _n/15000 _n/600 distinct lookahead sets _n extra closures _n shift entries, _n exceptions _n goto entries _n entries saved by goto default Optimizer space used: input _n/15000, output _n/15000 _n table entries, _n zero maximum spread: _n, maximum offset: _n The report of internal tables in the description file is left 2 implementation defined because all aspects of these limits are also 2 implementation defined. Some implementations may use dynamic allocation 2 techniques and have no specific limit values to report. 2 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The format of the y.output file is not given because specification of the format was not seen to enhance application portability. The listing is primarily intended to help human users understand and debug the parser; use of y.output by a portable application script is far-fetched. Furthermore, implementations have not produced consistent output and no clear winner was apparent. The format selected by the implementation should be human-readable, in addition to the requirement that it be a text file. Standard error reports are not specifically described because they are seldom of use to portable applications and there was no reason to restrict implementations. Some implementations recognize ={ as equivalent to {, because it appears in historical documentation. This construction was recognized and documented as obsolete as long ago as 1978, in the original paper _Y_a_c_c: _Y_e_t _A_n_o_t_h_e_r _C_o_m_p_i_l_e_r-_C_o_m_p_i_l_e_r by Stephen C. Johnson. POSIX.2 chose to leave it as obsolete and omit it. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 906 A C Language Development Utilities Option P1003.2/D11.2 Annex B (normative) C Language Bindings Option This annex describes the C language bindings to the language-independent services described in Section 7. The interfaces described in this annex may be provided by the conforming system; however, any system claiming conformance to the Language- Independent System Services C Language Bindings Option shall provide all of the interfaces described here. BEGIN_RATIONALE B.0.1 C Language Bindings Option Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) In this version of POSIX.2, the language-independent descriptions in Section 7 have not been developed. The language-independent syntax is being created in parallel by the POSIX.1 working group. Therefore, the C language bindings described in this annex are actually the full functional specifications. It is the intention of the POSIX.2 working group to rectify this situation in a revision to this standard, by moving the majority of the functional specifications back into Section 7, leaving Annex B with only brief descriptions of the C bindings to those services. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex B C Language Bindings Option 907 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.1 C Language Definitions B.1.1 POSIX Symbols Certain symbols in this annex are defined in headers. Some of those headers could also define symbols other than those defined by this standard, potentially conflicting with symbols used by the application. Also, this standard defines symbols that other standards do not permit to appear in those headers without some control on the visibility of those symbols. Symbols called _f_e_a_t_u_r_e _t_e_s_t _m_a_c_r_o_s are used to control the visibility of symbols that might be included in a header. Implementations, future versions of this standard, and other standards may define additional feature test macros. The #define_s for feature test macros shall appear in the application source code before any #include of a header where a symbol should be visible to some, but not all, applications. If the definition of the macro does not precede the #include, the result is undefined. Feature test macros shall begin with the underscore character (_) and an 1 uppercase letter, or with two underscore characters. 1 Implementations may add symbols to the headers shown in Table B-1, 1 provided the identifiers for those symbols begin with the corresponding 1 reserved prefixes in Table B-1. Similarly, implementations may add 1 symbols to the headers in Table B-1 that end in the string indicated as a 1 reserved suffix as long as the reserved suffix is in that part of the 1 name considered significant by the implementation. This shall be in 1 addition to any reservations made in the C Standard {7}. 1 After the last inclusion of a given header, an application may use any of 1 the symbol classes reserved in Table B-1 for its own purposes, as long as 1 the requirements in the note to Table B-1 are satisfied, noting that the 1 symbol declared in the header may become inaccessible. 1 Future revisions of this standard, and other POSIX standards, are likely 1 to use symbols in these same reserved spaces. 1 In addition, implementations may add members to a structure or union 1 without controlling the visibility of those members with a feature test 1 macro, as long as a user-defined macro with the same name cannot 1 interfere with the correct interpretation of the program. 1 A conforming POSIX.2 application shall define the feature test macro in Table B-2. When an application includes a header and the _POSIX_C_SOURCE feature test macro is defined to be the value 1 or 2, the effect shall be the same as if _POSIX_SOURCE was defined as described in POSIX.1 {8}. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 908 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table B-1 - POSIX.2 Reserved Header Symbols 1 __________________________________________________________________________________________________________________________________________________ 1 Reserved Reserved 1 Header Key Prefix Suffix 1 _______________________________________ 1 2 FNM_ 1 1 gl_ 1 2 GLOB_ 1 1 _MAX 1 1 re_ 1 1 rm_ 1 2 REG_ 1 1 we_ 1 2 WRDE_ 1 __________________________________________________________________________________________________________________________________________________ 1 NOTE: The Key values are: 1 (1) Prefixes and suffixes of symbols that shall not be declared or 1 #defined by the application. 1 (2) Prefixes and suffixes of symbols that shall be preceded in the 1 application with a #undef of that symbol before any other use. 1 Table B-2 - _POSIX_C_SOURCE __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ _POSIX_C_SOURCE Enable POSIX.1 {8} and POSIX.2 symbols; see text. __________________________________________________________________________________________________________________________________________________ In addition, when the application includes any of the headers defined in 1 this standard, and _POSIX_C_SOURCE is defined to be the value 2: 1 (1) All symbols defined in POSIX.2 to appear when the header is included shall be made visible. 1 (2) Symbols that are explicitly permitted, but not required, by POSIX.2 to appear in the header (including those in reserved name spaces) may be made visible. (3) Additional symbols shall not be made visible, unless controlled by another feature test macro. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.1 C Language Definitions 909 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The effect of defining the _POSIX_C_SOURCE macro to any other value is unspecified. If there are no feature test macros present in a program, only the set of symbols defined by the C Standard {7} shall be present. For each feature test macro present, only the symbols specified by that feature test macro plus those of the C Standard {7} shall be defined when the header is included. BEGIN_RATIONALE B.1.1.1 POSIX Symbols Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) When the application defines the _POSIX_C_SOURCE feature test macro with 1 value 2, it must be aware that all of the name space from POSIX.1 {8} and 1 POSIX.2 has been reserved. This does not imply that a POSIX.2 implementation must support POSIX.1 {8}, just that the application must not conflict with an implementation that does. The application can check _POSIX_VERSION and _POSIX2_C_VERSION at compile time to see which 1 standards are supported, if that is necessary. This is primarily an issue for the headers , , , and , since other POSIX.1 {8} names appear in other headers not mentioned in POSIX.2. It is expected that C bindings to future POSIX standards and revisions will define new values for _POSIX_C_SOURCE, with each new value reserving 1 the name space for that new standard or revision, plus all earlier POSIX standards. Using a single feature test macro for all standards rather than a separate macro for each standard furthers the goal of eventually combining all of the C bindings into one standard, which will be included in an international standard that refers to a language-independent ISO/IEC 9945-1 {8}. END_RATIONALE B.1.2 Headers and Function Prototypes Implementations shall declare function prototypes for all functions. Each function prototype shall appear in the header included in the synopsis of the function. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 910 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.1.3 Error Numbers Some of the functions in this annex use the variable _e_r_r_n_o to report errors. Such usage is documented in Errors in each specification. The usage of _e_r_r_n_o and the meanings of the symbolic names shall be as defined in POSIX.1 {8} B.1.3. BEGIN_RATIONALE B.1.4 C Language Definitions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This clause clarifies the interface to the C Standard {7}. The description was taken from POSIX.1, with one important modification. 1 Since POSIX.1 {8} and the C Standard {7} were being developed and 1 approved at about the same time, POSIX.1 {8} allowed ``Common Usage C'' 1 implementations to give system vendors time to develop Standard C 1 interfaces. Since Standard C compilers are now commonly available, 1 POSIX.2 does not explicitly describe the binding to Common Usage C. However, such a binding would be straightforward, as long as the rules for Common Usage C in POSIX.1 are followed. END_RATIONALE B.2 C Numerical Limits The following subclauses list the names of macros that C language applications can use to obtain minimum and current values for limits defined in 2.13.1. BEGIN_RATIONALE B.2.0.1 C Numerical Limits Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This subclause was added in Draft 9 to give C applications access to limits at compile time. Applications can use the values from the macros without resorting to _s_y_s_c_o_n_f(). The descriptions very closely follow the descriptions of macros and limits in POSIX.1 {8}. This definition of the limits is specific to the C language. Other language bindings might use different interfaces or names to provide equivalent information to the application. Note that there are no C bindings or interfaces that change based on the macros in Table B-5. These macro only advertise the availability of the Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.2 C Numerical Limits 911 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX associated utilities. END_RATIONALE B.2.1 C Macros for Symbolic Limits The macros in Table B-3 shall be defined in the header . They specify values for the symbolic limits defined in 2.13.1. Table B-3 - C Macros for Symbolic Limits __________________________________________________________________________________________________________________________________________________ Minimum Allowed Minimum for this Symbolic Limit by POSIX.2 Implementation _________________________________________________________________________ {BC_BASE_MAX} _POSIX2_BC_BASE_MAX BC_BASE_MAX {BC_DIM_MAX} _POSIX2_BC_DIM_MAX BC_DIM_MAX {BC_SCALE_MAX} _POSIX2_BC_SCALE_MAX BC_SCALE_MAX {BC_STRING_MAX} _POSIX2_BC_STRING_MAX BC_STRING_MAX {COLL_WEIGHTS_MAX} _POSIX2_COLL_WEIGHTS_MAX COLL_WEIGHTS_MAX {EXPR_NEST_MAX} _POSIX2_EXPR_NEST_MAX EXPR_NEST_MAX {LINE_MAX} _POSIX2_LINE_MAX LINE_MAX {RE_DUP_MAX} _POSIX2_RE_DUP_MAX RE_DUP_MAX __________________________________________________________________________________________________________________________________________________ The names in the first column of Table B-3 are symbolic limits as defined in 2.13.1. The names in the second column are C macros that define the smallest values permitted for the symbolic limits on any POSIX.2 implementation; they shall be defined as constant expressions with the most restrictive values specified in 2.13.1. The names in the third column are C macros that define less restrictive values provided by the implementation; each shall be defined as a constant that - is not smaller than the associated macro in column 2, and - is not larger than the smallest value that will be returned by _s_y_s_c_o_n_f() when the application is executed. BEGIN_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 912 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.2.1.1 C Macros for Symbolic Limits Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The macros in column 3 of Table B-3 are required to be constant expressions. If the C binding is to be used with POSIX.2 implementations over which the implementor of the binding has no control, the column-3 values must be the same as column-2. If the implementation of the C binding is intended to be used with a POSIX.2 implementation that always supports a larger value than one in column 2, that implementation of the binding may use the larger value for the column-3 macro. If an application compiled with that binding is then used with a different POSIX.2 implementation, it is the user's fault that the application is being run in an environment in which it was not intended. The application can assume, for example, that the stream created by popen("mailx user","w") will accept lines of length {LINE_MAX}, even if this is larger than {_POSIX2_LINE_MAX}. However, if the application is creating a data file that might be processed on another implementation, it should use the values in column 2. END_RATIONALE B.2.2 Compile-Time Symbolic Constants for Portability Specifications The macros in Table B-4 shall be defined in the header . These macros can be used by the application, at compile time, to determine which optional facilities are present and what actions shall be taken by the implementation. BEGIN_RATIONALE B.2.2.1 Compile-Time Symbolic Constants for Portability Specifications Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The symbolic constant _POSIX2_C_VERSION is analogous to _POSIX_VERSION, defined in POSIX.1 {8}. It indicates the version of the C interfaces that are supplied by the compiler and runtime library. 1 The version of the utilities is given by the {POSIX2_VERSION} limit (see 2.13.1), whose value can be obtained at runtime using _s_y_s_c_o_n_f() (see 1 B.10.2). 1 END_RATIONALE 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.2 C Numerical Limits 913 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table B-4 - C Compile-Time Symbolic Constants __________________________________________________________________________________________________________________________________________________ Macro Name Description _________________________________________________________________________ _POSIX2_C_VERSION The integer value 199???L. This value 11 indicates the version of the interfaces in 1 this annex that are provided by the 1 implementation. It will change with each 1 published version of this standard to 1 indicate the 4-digit year and 2-digit month 1 that the standard was approved by the IEEE 1 Standards Board. 1 __________________________________________________________________________________________________________________________________________________ B.2.3 Execution-Time Symbolic Constants for Portability Specifications The macros in Table B-5 can be used by the application at execution time to determine which optional facilities are present. If a macro is defined to have the value -1 in the header , the implementation shall never provide that feature when the application runs under that implementation. If a macro is defined to have a value other than -1, the implementation shall always provide that feature. If the macro is undefined, then the _s_y_s_c_o_n_f() function (see B.10.2) can be used to determine if the feature is provided for a particular invocation of the application. Table B-5 - C Execution-Time Symbolic Constants __________________________________________________________________________________________________________________________________________________ Macro Name Description _________________________________________________________________________ _POSIX2_C_DEV The system supports the C Language Development Utilities Option (see Annex A) _POSIX2_FORT_DEV The system supports the FORTRAN Development Utilities Option (see Annex C) _POSIX2_FORT_RUN The system supports the FORTRAN Runtime Utilities Option (see Annex C) _POSIX2_LOCALEDEF The system supports the creation of locales as described in 4.35. _POSIX2_SW_DEV The system supports the Software Development Utilities Option (see Section 6) __________________________________________________________________________________________________________________________________________________ Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 914 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.2.4 POSIX.1 C Numerical Limits The macros specified in POSIX.1 {8} to provide compile-time values for the configurable variables in Table 7-1 (see 7.8.2) shall also be visible in a POSIX.2 system. Other macros required by POSIX.1 {8} 2.9 (Numerical Limits) and 2.10 (Symbolic Constants) may also be visible in a POSIX.2 system. BEGIN_RATIONALE B.2.4.1 POSIX.1 C Numerical Limits Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) Subclause 7.8.2 requires that certain POSIX.1 {8} configurable variables be visible in POSIX.2. Subclause B.2.4 ensures that POSIX.2 C applications can obtain these variables using the same macros as POSIX.1 {8} C applications. It also allows an implementation to make all of the POSIX.1 {8} macros available even if _POSIX_SOURCE is not set. It 1 also allows an implementation to make all of the POSIX.1 {8} symbols 1 available even if it does not support all of POSIX.1 {8}. 1 END_RATIONALE 1 B.3 C Binding for Shell Command Interface BEGIN_RATIONALE B.3.0.1 C Binding for Shell Command Interface Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The _s_y_s_t_e_m() and _p_o_p_e_n() functions should not be used by programs that have set user (or group) ID privileges, as defined in POSIX.1 {8}. The _f_o_r_k() and _e_x_e_c family of functions [except _e_x_e_c_l_p() and _e_x_e_c_v_p()], also defined in POSIX.1 {8}, should be used instead. This prevents any unforeseen manipulation of the user's environment that could cause execution of commands not anticipated by the calling program. If the original and ``_p_o_p_e_n()ed'' processes both intend to read or write or read and write a common file, and either will be using FILE-type C functions [_f_r_e_a_d(), _f_w_r_i_t_e(), etc.], the rules in POSIX.1 {8} 8.2.3 must be observed. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.3 C Binding for Shell Command Interface 915 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.3.1 C Binding for Execute Command Function: _s_y_s_t_e_m() B.3.1.1 Synopsis #include int system(const char *_c_o_m_m_a_n_d); B.3.1.2 Description This standard requires the _s_y_s_t_e_m() function as described in the C Standard {7}. The _s_y_s_t_e_m() function shall execute the command specified by the string pointed to by _c_o_m_m_a_n_d. The environment of the executed command shall be as if a child process were created using the POSIX.1 {8} _f_o_r_k() function, and the child process invoked the sh utility (see 4.56) using the POSIX.1 {8} _e_x_e_c_l() function as follows: execl(<_s_h_e_l_l _p_a_t_h>, "_s_h", "-_c", _c_o_m_m_a_n_d, (_c_h_a_r *)_0); where <_s_h_e_l_l _p_a_t_h> is an unspecified pathname for the sh utility. The _s_y_s_t_e_m() function shall ignore the SIGINT and SIGQUIT signals, and block the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, then the application should examine the return value from _s_y_s_t_e_m() and take whatever action is appropriate to the application if the command terminated due to receipt of a signal. The _s_y_s_t_e_m() function shall not affect the termination status of any child of the calling processes other than the process(es) it itself creates. The _s_y_s_t_e_m() function shall not return until the child process has terminated. B.3.1.3 Returns If _c_o_m_m_a_n_d is NULL, the _s_y_s_t_e_m() function shall return nonzero. If _c_o_m_m_a_n_d is not NULL, the _s_y_s_t_e_m() function shall return the termination status of the command language interpreter in the format specified by the _w_a_i_t_p_i_d() function in POSIX.1 {8}. The termination status of the command language interpreter is as specified for the sh utility, except that if some error prevents the command language interpreter from executing after the child process is created, the return Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 916 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 value from _s_y_s_t_e_m() shall be as if the command language interpreter had terminated using _e_x_i_t(127) or __e_x_i_t(127). If a child process cannot be created, or if the termination status for the command language interpreter cannot be obtained, _s_y_s_t_e_m() shall return -1 and set _e_r_r_n_o to indicate the error. B.3.1.4 Errors The _s_y_s_t_e_m() function may set _e_r_r_n_o values as described by _f_o_r_k() in POSIX.1 {8}. BEGIN_RATIONALE B.3.1.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The C Standard {7} specifies that when _c_o_m_m_a_n_d is NULL, _s_y_s_t_e_m() returns nonzero if there is a command interpreter available and zero if one is not available. At first reading, it might appear that POSIX.2 conflicts with this, since it requires _s_y_s_t_e_m(NULL) to always return nonzero. There is no conflict, however. A POSIX.2 implementation must always have a command interpreter available, and is nonconforming if none is present. It is therefore permissible for the _s_y_s_t_e_m() function on a POSIX.2 system to implement the behavior specified by the C Standard {7} as long as it is understood that the implementation is not POSIX.2 conforming if 1 _s_y_s_t_e_m(NULL) returns zero. 1 Note that, while _s_y_s_t_e_m() must ignore SIGINT and SIGQUIT and block SIGCHLD while waiting for the child to terminate, the handling of signals in the executed command is as specified by _f_o_r_k() and _e_x_e_c. For example, if SIGINT is being caught or is set to SIG_DFL when _s_y_s_t_e_m() is called, then the child will be started with SIGINT handling set to SIG_DFL. Ignoring SIGINT and SIGQUIT in the parent process prevents coordination problems (two processes reading from the same terminal, for example) when the executed command ignores or catches one of the signals. It is also usually the correct action when the user has given a command to the application to be executed synchronously (as in the ``!'' command in many interactive applications). In either case, the signal should be delivered only to the child process, not to the application itself. There is one situation where ignoring the signals might have less than the desired effect. This is when the application uses _s_y_s_t_e_m() to perform some task invisible to the user. If the user typed the interrupt character (^C for example) while _s_y_s_t_e_m() is being used in this way, one would expect the application to be killed, but only the executed command will be killed. Applications that use _s_y_s_t_e_m() in this way should carefully check the return status from _s_y_s_t_e_m() to see if the executed command was successful, and should take appropriate action when the command fails. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.3 C Binding for Shell Command Interface 917 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Blocking SIGCHLD while waiting for the child to terminate prevents the application from catching the signal and obtaining status from _s_y_s_t_e_m()'s child process before _s_y_s_t_e_m() can get the status itself. _E_x_a_m_p_l_e_s_,__U_s_a_g_e The context in which the utility is ultimately executed may differ from that in which the _s_y_s_t_e_m() function was called. For example, file descriptors that have the FD_CLOEXEC flag set will be closed, and the process ID and parent process ID will be different. Also, if the executed utility changes its environment variables or its current working directory, that change will not be reflected in the caller's context. Earlier drafts of this standard required, or allowed, _s_y_s_t_e_m() to return with _e_r_r_n_o [EINTR] if it was interrupted with a signal. This error return was removed, and a requirement that _s_y_s_t_e_m() not return until the child has terminated was added. This means that if a _w_a_i_t_p_i_d() call in _s_y_s_t_e_m() exits with _e_r_r_n_o [EINTR], _s_y_s_t_e_m() must re-issue the _w_a_i_t_p_i_d(). This change was made for two reasons: (1) There is no way for an application to clean up if _s_y_s_t_e_m() returns [EINTR], short of calling _w_a_i_t(), and that could have the undesirable effect of returning status of children other than the one started by _s_y_s_t_e_m(). (2) While it might require a change in some historical implementations, those implementations already have to be changed because they use _w_a_i_t() instead of _w_a_i_t_p_i_d(). Note that if the application is catching SIGCHLD signals, it will receive 1 such a signal before a successful _s_y_s_t_e_m() call returns. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The C Standard {7} requires that a call to _s_y_s_t_e_m() with a NULL will return a nonzero value, indicating the presence of a command language interpreter available to the system. It was explicitly decided that when _c_o_m_m_a_n_d is NULL, _s_y_s_t_e_m() should not be required to check to make sure that the command language interpreter actually exists with the correct mode, that there are enough processes to execute it, etc. The call _s_y_s_t_e_m(NULL) could, theoretically, check for such problems as too many existing child processes, and return zero. However, it would be inappropriate to return zero due to such a (presumably) transient condition. If some condition exists that is not under the control of this application and that would cause _a_n_y _s_y_s_t_e_m() call to fail, that system has been rendered nonconformant. Modified in Draft 6 to reflect the availability of the _w_a_i_t_p_i_d() function in POSIX.1 {8}. To conform to this standard, _s_y_s_t_e_m() must use Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 918 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _w_a_i_t_p_i_d(), or some similar function, instead of _w_a_i_t(). Figure B-1 illustrates how _s_y_s_t_e_m() might be implemented on a POSIX.1 {8} implementation. Note that, while a particular implementation of _s_y_s_t_e_m() (such as the one above) can assume a particular path for the shell, such a path is not necessarily valid on another system. The above example is not portable, and is not intended to be. There is no defined way for an application to find the specific path for the shell. However, _c_o_n_f_s_t_r() can provide a value for PATH that is guaranteed to find the sh utility. One reviewer suggested that an implementation of _s_y_s_t_e_m() might want to use an environment variable such as SHELL to determine which command interpreter to use. The supposed implementation would use the default command interpreter if the one specified by the environment variable was not available. This would allow a user, when using an application that prompts for command lines to be processed using _s_y_s_t_e_m(), to specify a different command interpreter. Such an implementation is discouraged. If the alternate command interpreter did not follow the command line syntax specified in POSIX.2, then changing SHELL would render _s_y_s_t_e_m() nonconformant. This would affect applications that expected the specified behavior from _s_y_s_t_e_m(), and since this standard does not mention that SHELL affects _s_y_s_t_e_m(), the application would not know that it needed to unset SHELL. END_RATIONALE B.3.2 C Binding for Pipe Communications with Programs Functions: _p_o_p_e_n(), _p_c_l_o_s_e() B.3.2.1 Synopsis #include FILE *popen(const char *_c_o_m_m_a_n_d, const char *_m_o_d_e); int pclose(FILE *_s_t_r_e_a_m); B.3.2.2 Description The _p_o_p_e_n() function shall execute the command specified by the string _c_o_m_m_a_n_d. It shall create a pipe between the calling program and the executed command, and return a pointer to a C Standard {7} stream that can be used to either read from or write to the pipe. The _p_c_l_o_s_e() function shall close the stream, wait for the command to terminate, and return the termination status from the command language interpreter. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.3 C Binding for Shell Command Interface 919 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _________________________________________________________________________ #include int system(const char *cmd) 1 { int stat; pid_t pid; struct sigaction sa, savintr, savequit; sigset_t saveblock; if (cmd == NULL) return(1); sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigemptyset(&savintr.sa_mask); 1 sigemptyset(&savequit.sa_mask); 1 sigaction(SIGINT, &sa, &savintr); sigaction(SIGQUIT, &sa, &savequit); sigaddset(&sa.sa_mask, SIGCHLD); 1 sigprocmask(SIG_BLOCK, &sa.sa_mask, &saveblock); if ((pid = fork()) == 0) { sigaction(SIGINT, &savintr, (struct sigaction *)0); sigaction(SIGQUIT, &savequit, (struct sigaction *)0); sigprocmask(SIG_SETMASK, &saveblock, (sigset_t *)0); execl("/bin/sh", "sh", "-c", cmd, (char *)0); _exit(127); } if (pid == -1) { stat = -1; /* errno comes from fork() */ } else { while (waitpid(pid, &stat, 0) == -1) { if (errno != EINTR) { stat = -1; break; } } } sigaction(SIGINT, &savintr, (struct sigaction *)0); sigaction(SIGQUIT, &savequit, (struct sigaction *)0); sigprocmask(SIG_SETMASK, &saveblock, (sigset_t *)0); return(stat); } _________________________________________________________________________ Figure B-1 - Sample _ssss_yyyy_ssss_tttt_eeee_mmmm() Implementation Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 920 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The environment of the executed command shall be as if a child process were created within the _p_o_p_e_n() call using the _f_o_r_k() function, and the child invoked the sh utility using the call: execl(<_s_h_e_l_l _p_a_t_h>, "sh", "-c", _c_o_m_m_a_n_d, (_c_h_a_r *)_0); where <_s_h_e_l_l _p_a_t_h> is an unspecified pathname for the sh utility. 1 However, _p_o_p_e_n() shall ensure that any streams from previous _p_o_p_e_n() 1 calls that remain open in the parent process are closed in the new child 1 process. 1 The _m_o_d_e argument to _p_o_p_e_n() is a string that specifies I/O mode: (1) If _m_o_d_e is "r", when the child process is started its file descriptor STDOUT_FILENO shall be the writable end of the pipe, and the file descriptor _f_i_l_e_n_o(_s_t_r_e_a_m) in the calling process, where _s_t_r_e_a_m is the stream pointer returned by _p_o_p_e_n(), shall be the readable end of the pipe. (2) If _m_o_d_e is "w", when the child process is started its file descriptor STDIN_FILENO shall be the readable end of the pipe, and the file descriptor _f_i_l_e_n_o(_s_t_r_e_a_m) in the calling process, where _s_t_r_e_a_m is the stream pointer returned by _p_o_p_e_n(), shall be the writable end of the pipe. (3) If _m_o_d_e is any other value, the result is undefined. A stream opened by _p_o_p_e_n() should be closed by _p_c_l_o_s_e(). As stated above, _p_c_l_o_s_e() shall return the termination status from the command language interpreter. However, if the application has called any of the following: (1) _w_a_i_t(), (2) _w_a_i_t_p_i_d() with a _p_i_d argument less than or equal to zero or equal to the process ID of the command line interpreter, or (3) any other function not defined in POSIX.1 {8} or POSIX.2 that could do one of the above and one of those calls caused the termination status to be unavailable to _p_c_l_o_s_e(), then _p_c_l_o_s_e() shall return -1 with _e_r_r_n_o set to [ECHILD] to report this situation. In any case, _p_c_l_o_s_e() shall not return before the child process created by _p_o_p_e_n() has terminated. If the command language interpreter cannot be executed, the child termination status returned by _p_c_l_o_s_e() shall be as if the command language interpreter terminated using _e_x_i_t(127) or __e_x_i_t(127). If it can be executed, the _e_x_i_t() value shall be as described for the sh utility. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.3 C Binding for Shell Command Interface 921 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The _p_c_l_o_s_e() function shall not affect the termination status of any child of the calling process other than the one created by _p_o_p_e_n() for the associated stream. If the argument _s_t_r_e_a_m to _p_c_l_o_s_e() is not a pointer to a stream created by _p_o_p_e_n(), the result of _p_c_l_o_s_e() is undefined. After _p_o_p_e_n(), both the parent and the child process shall be capable of executing independently before either terminates. See 2.9.1.2. B.3.2.3 Returns The _p_o_p_e_n() function shall return a NULL pointer if the pipe or subprocess cannot be created. Otherwise, it shall return a stream pointer as described above. Upon successful return, _p_c_l_o_s_e() shall return the termination status of the command language interpreter. Otherwise, _p_c_l_o_s_e() shall return -1 and set _e_r_r_n_o to indicate the error. B.3.2.4 Errors If any of the following conditions are detected, the _p_o_p_e_n() function shall return NULL and set _e_r_r_n_o to the corresponding value: [EINVAL] The _m_o_d_e argument is invalid. The _p_o_p_e_n() function may also set _e_r_r_n_o values as described by the POSIX.1 {8} _f_o_r_k() or _p_i_p_e() functions. If any of the following conditions are detected, the _p_c_l_o_s_e() function shall return -1 and set _e_r_r_n_o to the corresponding value: [ECHILD] The status of the child process could not be obtained, as described above. BEGIN_RATIONALE B.3.2.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e Because open files are shared, a mode "r" command can be used as an input filter and a mode "w" command as an output filter. The behavior of _p_o_p_e_n() is specified for _m_o_d_es of "r" and "w". Other modes such as "rb" and "wb" might be supported by specific implementations, but these would not be portable features. Note that historical implementations of _p_o_p_e_n() only check to see if the first Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 922 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 character of _m_o_d_e is r. Thus, a _m_o_d_e of "robert the robot" would be treated as _m_o_d_e "r", and a _m_o_d_e of "anything else" would be treated as _m_o_d_e "w". If the application calls _w_a_i_t_p_i_d() with a _p_i_d argument greater than zero, and it still has a _p_o_p_e_n()ed stream open, it must ensure that _p_i_d does not refer to the process started by _p_o_p_e_n(). _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e There is a requirement that _p_c_l_o_s_e() not return before the child process terminates. This is intended to disallow implementations that return [EINTR] if a signal is received while waiting. If _p_c_l_o_s_e() returned before the child terminated, there would be no way for the application to discover which child used to be associated with the stream, and it could not do the cleanup itself. If the stream pointed to by _s_t_r_e_a_m was not created by _p_o_p_e_n(), historical implementations of _p_c_l_o_s_e() return -1 without setting _e_r_r_n_o. To avoid requiring _p_c_l_o_s_e() to set _e_r_r_n_o in this case, this standard makes the behavior undefined. An application should not use _p_c_l_o_s_e() to close any stream that wasn't created by _p_o_p_e_n(). Wording was added in Draft 10 requiring that the parent and child processes be able to execute independently. This behavior has been the intent all along, and the specific words were taken from the current draft of the POSIX.1a revision to POSIX.1 {8}. Rationale about this wording appears in B.3.1.1 of POSIX.1a. Some historical implementations either block or ignore the signals SIGINT, SIGQUIT, and SIGHUP while waiting for the child process to terminate. Since this behavior is not described in POSIX.2, such implementations are not conforming. Also, some historical implementations return [EINTR] if a signal is received, even though the child process has not terminated. Such implementations are also considered nonconforming. Consider, for example, an application that uses popen("command", "r") to start _c_o_m_m_a_n_d, which is part of the same application. The parent writes a prompt to its standard output (presumably the terminal) and then reads from the _p_o_p_e_n_e_d stream. The child reads the response from the user, does some transformation on the response (pathname expansion, perhaps) and writes the result to its standard output. The parent process reads the result from the pipe, does something with it, and prints another prompt. The cycle repeats. Assuming that both processes do appropriate buffer flushing, this would be expected to work. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.3 C Binding for Shell Command Interface 923 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Modified in Draft 6 to reflect the availability of the _w_a_i_t_p_i_d() function in POSIX.1 {8}. To conform to this standard, _p_c_l_o_s_e() must use _w_a_i_t_p_i_d(), or some similar function, instead of _w_a_i_t(). Figure B-2 illustrates how the _p_c_l_o_s_e() function might be implemented on a POSIX.1 {8} system. _________________________________________________________________________ int pclose(FILE *stream) 1 { int stat; pid_t pid; pid = <_p_i_d _f_o_r _p_r_o_c_e_s_s _c_r_e_a_t_e_d _f_o_r _s_t_r_e_a_m _b_y _p_o_p_e_n()> (void) fclose(stream); while (waitpid(pid, &stat, 0) == -1) { if (errno != EINTR) { stat = -1; break; } } return(stat); } _________________________________________________________________________ Figure B-2 - Sample _pppp_cccc_llll_oooo_ssss_eeee() Implementation END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 924 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.4 C Binding for Access Environment Variables Function: _g_e_t_e_n_v() The C language binding to the service described in 7.2 shall be the POSIX.1 {8} _g_e_t_e_n_v() function. B.5 C Binding for Regular Expression Matching Functions: _r_e_g_c_o_m_p(), _r_e_g_e_x_e_c(), _r_e_g_f_r_e_e(), _r_e_g_e_r_r_o_r() B.5.1 Synopsis #include #include int regcomp(regex_t *_p_r_e_g, const char *_p_a_t_t_e_r_n, int _c_f_l_a_g_s); int regexec(const regex_t *_p_r_e_g, const char *_s_t_r_i_n_g, size_t _n_m_a_t_c_h, regmatch_t _p_m_a_t_c_h[], int _e_f_l_a_g_s); size_t regerror(int _e_r_r_c_o_d_e, const regex_t *_p_r_e_g, char *_e_r_r_b_u_f, size_t _e_r_r_b_u_f__s_i_z_e); void regfree(regex_t *_p_r_e_g); B.5.2 Description These functions shall interpret basic and extended regular expressions, as described in 2.8. The header shall define the structure types _r_e_g_e_x__t and _r_e_g_m_a_t_c_h__t. The structure type _r_e_g_e_x__t shall include at least the member shown in Table B-6. The structure type _r_e_g_m_a_t_c_h__t shall contain at least the members shown in Table B-7. The type _r_e_g_o_f_f__t, which shall be defined in , shall 1 be a signed arithmetic type that can hold the largest value that can be 1 stored in either an _o_f_f__t or a _s_s_i_z_e__t. 1 The _r_e_g_c_o_m_p() function shall compile the regular expression contained in 1 the string pointed to by the _p_a_t_t_e_r_n argument and place the results in 1 the structure pointed to by _p_r_e_g. The _c_f_l_a_g_s argument shall be the bitwise inclusive OR of zero or more of the flags shown in Table B-8, which shall be defined in the header . Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.5 C Binding for Regular Expression Matching 925 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table B-6 - Structure Type _rrrr_eeee_gggg_eeee_xxxx______tttt __________________________________________________________________________________________________________________________________________________ Member Member Type Name Description _________________________________________________________________________ _s_i_z_e__t _r_e__n_s_u_b Number of parenthesized subexpressions. __________________________________________________________________________________________________________________________________________________ Table B-7 - Structure Type _rrrr_eeee_gggg_mmmm_aaaa_tttt_cccc_hhhh______tttt __________________________________________________________________________________________________________________________________________________ Member Member Type Name Description _________________________________________________________________________ _r_e_g_o_f_f__t _r_m__s_o Byte offset from start of _s_t_r_i_n_g to start 11 of substring. 1 _r_e_g_o_f_f__t _r_m__e_o Byte offset from start of _s_t_r_i_n_g of the 11 first character after the end of substring. 1 __________________________________________________________________________________________________________________________________________________ Table B-8 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp() _cccc_ffff_llll_aaaa_gggg_ssss Argument __________________________________________________________________________________________________________________________________________________ _ffff_llll_aaaa_gggg Description _________________________________________________________________________ REG_EXTENDED Use Extended Regular Expressions. REG_ICASE Ignore case in match. See 2.8.2. REG_NOSUB Report only success/fail in _r_e_g_e_x_e_c(). REG_NEWLINE Change the handling of , as described in the text. __________________________________________________________________________________________________________________________________________________ Table B-9 - _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() _eeee_ffff_llll_aaaa_gggg_ssss Argument __________________________________________________________________________________________________________________________________________________ _ffff_llll_aaaa_gggg Description _________________________________________________________________________ REG_NOTBOL The first character of the string pointed to by _s_t_r_i_n_g is not the beginning of the line. Therefore, the circumflex character (^), when taken as a special character, shall not match the beginning of _s_t_r_i_n_g. The last character of the string pointed to by _s_t_r_i_n_g is not the end of the line. Therefore, the dollar sign ($), when taken as a special character, shall not match the end of _s_t_r_i_n_g. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 926 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 REG_NOTEOL __________________________________________________________________________________________________________________________________________________ The default regular expression type for _p_a_t_t_e_r_n shall be a Basic Regular Expression. The application can specify Extended Regular Expressions using the REG_EXTENDED _c_f_l_a_g_s flag. If the function _r_e_g_c_o_m_p() succeeds, it shall return zero; otherwise it shall return nonzero, and the content of _p_r_e_g shall be undefined. If the REG_NOSUB flag was not set in _c_f_l_a_g_s, then _r_e_g_c_o_m_p() shall set _r_e__n_s_u_b to the number of parenthesized subexpressions [delimited by \( \) in basic regular expressions or ( ) in extended regular expressions] found in _p_a_t_t_e_r_n. The _r_e_g_e_x_e_c() function shall compare the null-terminated string specified by _s_t_r_i_n_g against the compiled regular expression _p_r_e_g initialized by a previous call to _r_e_g_c_o_m_p(). If it finds a match, _r_e_g_e_x_e_c() shall return zero; otherwise it shall return nonzero indicating either no match or an error. The _e_f_l_a_g_s argument shall be the bitwise inclusive OR of zero or more of the flags shown in Table B-9, which shall be defined in the header . If _n_m_a_t_c_h is zero or REG_NOSUB was set in the _c_f_l_a_g_s argument to _r_e_g_c_o_m_p(), then _r_e_g_e_x_e_c() shall ignore the _p_m_a_t_c_h argument. Otherwise, the _p_m_a_t_c_h argument shall point to an array with at least _n_m_a_t_c_h elements, and _r_e_g_e_x_e_c() shall fill in the elements of that array with offsets of the substrings of _s_t_r_i_n_g that correspond to the parenthesized subexpressions of _p_a_t_t_e_r_n: _p_m_a_t_c_h[_i]._r_m__s_o shall be the byte offset of the beginning and _p_m_a_t_c_h[_i]._r_m__e_o shall be one greater than the byte offset of the end of substring _i. (Subexpression _i begins at the _ith matched open parenthesis, counting from 1.) Offsets in _p_m_a_t_c_h[0] shall identify the substring that corresponds to the entire regular expression. Unused elements of _p_m_a_t_c_h up to _p_m_a_t_c_h[_n_m_a_t_c_h-1] shall be filled with -1. If there are more than _n_m_a_t_c_h subexpressions in _p_a_t_t_e_r_n (_p_a_t_t_e_r_n itself counts as a subexpression), then _r_e_g_e_x_e_c() shall still do the match, but shall record only the first _n_m_a_t_c_h substrings. When matching a basic or extended regular expression, any given parenthesized subexpression of _p_a_t_t_e_r_n might participate in the match of several different substrings of _s_t_r_i_n_g, or it might not match any substring even though the pattern as a whole did match. The following rules shall be used to determine which substrings to report in _p_m_a_t_c_h when matching regular expressions: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.5 C Binding for Regular Expression Matching 927 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX (1) If subexpression _i in a regular expression is not contained 1 within another subexpression, and it participated in the match 1 several times, then the byte offsets in _p_m_a_t_c_h[_i] shall delimit the last such match. (2) If subexpression _i is not contained within another 1 subexpression, and it did not participate in an otherwise 1 successful match, then the byte offsets in _p_m_a_t_c_h[_i] shall be 1 -1. A subexpression shall not participate in the match when: 1 (a) * or \{ \} appears immediately after the subexpression in 1 a basic regular expression, or *, ?, or { } appears 1 immediately after the subexpression in an extended regular 1 expression, and the subexpression did not match (matched 1 zero times), or 1 (b) | is used in an extended regular expression to select this 1 subexpression or another, and the other subexpression 1 matched. 1 (3) If subexpression _i is contained within another subexpression _j, 1 and _i is not contained within any other subexpression that is 1 contained within _j, and a match of subexpression _j is reported 1 in _p_m_a_t_c_h[_j], then the match or nonmatch of subexpression _i 1 reported in _p_m_a_t_c_h[_i] shall be as described in (1) and (2) 1 above, but within the substring reported in _p_m_a_t_c_h[_j] rather 1 than the whole string. 1 (4) If subexpression _i is contained in subexpression _j, and the byte offsets in _p_m_a_t_c_h[_j] are -1, then the byte offsets in _p_m_a_t_c_h[_i] 1 also shall be -1. 1 (5) If subexpression _i matched a zero-length string, then both byte offsets in _p_m_a_t_c_h[_i] shall be the byte offset of the character or null terminator immediately following the zero-length string. If, when _r_e_g_e_x_e_c() is called, the locale is different than when the 1 regular expression was compiled, the result is undefined. 1 If REG_NEWLINE is not set in _c_f_l_a_g_s, then a character in _p_a_t_t_e_r_n or _s_t_r_i_n_g shall be treated as an ordinary character. If REG_NEWLINE is set, then shall be treated as an ordinary character except as follows: (1) A in _s_t_r_i_n_g shall not be matched by a period outside of a bracket expression (see 2.8.3.1.3) or by any form of a nonmatching list (see 2.8.3.2). Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 928 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 (2) A circumflex (^) in _p_a_t_t_e_r_n, when used to specify expression anchoring (see 2.8.4.4 and 2.8.4.6), shall match the zero-length string immediately after a in _s_t_r_i_n_g, regardless of the setting of REG_NOTBOL. (3) A dollar-sign ($) in _p_a_t_t_e_r_n, when used to specify expression anchoring, shall match the zero-length string immediately before a in _s_t_r_i_n_g, regardless of the setting of REG_NOTEOL. The _r_e_g_f_r_e_e() function shall free any memory allocated by _r_e_g_c_o_m_p() associated with _p_r_e_g. The _r_e_g_e_r_r_o_r() function provides a mapping from error codes returned by _r_e_g_c_o_m_p() and _r_e_g_e_x_e_c() to unspecified printable strings. It shall generate a string corresponding to the value of the _e_r_r_c_o_d_e argument, which shall be the last nonzero value returned by _r_e_g_c_o_m_p() or _r_e_g_e_x_e_c() with the given value of _p_r_e_g. If _e_r_r_c_o_d_e is not such a value, the content of the generated string is unspecified. If _p_r_e_g is (_r_e_g_e_x_e_c__t)0, but 1 _e_r_r_c_o_d_e is a value returned by a previous call to _r_e_g_e_x_e_c() or _r_e_g_c_o_m_p(), 1 then _r_e_g_e_r_r_o_r() still shall generate an error string corresponding to the 1 value of _e_r_r_c_o_d_e, but it might not be as detailed under some 1 implementations. 1 If the _e_r_r_b_u_f__s_i_z_e argument is not zero, _r_e_g_e_r_r_o_r() shall place the generated string into the _e_r_r_b_u_f__s_i_z_e-byte buffer pointed to by _e_r_r_b_u_f. If the string (including the terminating null) cannot fit in the buffer, _r_e_g_e_r_r_o_r() shall truncate the string and null-terminate the result. If _e_r_r_b_u_f__s_i_z_e is zero, _r_e_g_e_r_r_o_r() shall ignore the _e_r_r_b_u_f argument, but shall return the integer value described below. If the _p_r_e_g argument to _r_e_g_e_x_e_c() or _r_e_g_f_r_e_e() is not a compiled regular expression returned by _r_e_g_c_o_m_p(), the result is undefined. A _p_r_e_g shall no longer be treated as a compiled regular expression after it is given to _r_e_g_f_r_e_e(). B.5.3 Returns On successful completion, the _r_e_g_c_o_m_p() function shall return zero. On successful completion, the _r_e_g_e_x_e_c() function shall return zero to indicate that _s_t_r_i_n_g matched _p_a_t_t_e_r_n, or REG_NOMATCH (which shall be defined in ) to indicate no match. The _r_e_g_e_r_r_o_r() function shall return the size of the buffer needed to hold the entire generated string, including the null termination. If the return value is greater than _e_r_r_b_u_f__s_i_z_e, the string returned in the buffer pointed to by _e_r_r_b_u_f has been truncated. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.5 C Binding for Regular Expression Matching 929 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Table B-10 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp(), _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() Return Values __________________________________________________________________________________________________________________________________________________ Error Code Description _________________________________________________________________________ REG_NOMATCH _r_e_g_e_x_e_c() failed to match REG_BADPAT Invalid regular expression REG_ECOLLATE Invalid collating element referenced REG_ECTYPE Invalid character class type referenced REG_EESCAPE Trailing \ in pattern REG_ESUBREG Number in \_d_i_g_i_t invalid or in error REG_EBRACK [ ] imbalance REG_EPAREN \( \) or ( ) imbalance REG_EBRACE \{ \} imbalance REG_BADBR Content of \{ \} invalid: Not a number, number too large, more than two numbers, first larger than second REG_ERANGE Invalid endpoint in range expression REG_ESPACE Out of memory REG_BADRPT ?, *, or + not preceded by valid regular expression __________________________________________________________________________________________________________________________________________________ B.5.4 Errors If _r_e_g_c_o_m_p() or _r_e_g_e_x_e_c() fails, it shall return a nonzero value indicating the type of failure. Table B-10 contains the names of macros for error codes that may be returned. If a code is returned, the interpretation shall be as given in the table. The implementation shall define the macros in Table B-10 in , and may define additional macros beginning with ``REG_'' for other error codes. If _r_e_g_c_o_m_p() detects an illegal regular expression, it may return REG_BADPAT, or it may return one of the error codes that more precisely describes the error. BEGIN_RATIONALE B.5.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e An example of using the functions is shown in Figure B-3 The following demonstrates how the REG_NOTBOL flag could be used with _r_e_g_e_x_e_c() to find all substrings in a line that match a pattern supplied Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 930 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _________________________________________________________________________ #include /* * Match string against the extended regular expression in * pattern, treating errors as no match. * 1 * Return 1 for match, 0 for no match. */ int match(const char *string, const char *pattern) 1 { int status; regex_t re; if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) { return(0); /* report error */ } status = regexec(&re, string, (size_t) 0, NULL, 0); regfree(&re); if (status != 0) { return(0); /* report error */ } return status == 0; 1 } _________________________________________________________________________ Figure B-3 - Example Regular Expression Matching by a user. (For simplicity of the example, very little error checking is done.) (void) regcomp (&re, pattern, 0); /* this call to regexec() finds the first match on the line */ error = regexec (&re, &buffer[0], 1, &pm, 0); while (error == 0) { /* while matches found */ <_s_u_b_s_t_r_i_n_g _f_o_u_n_d _b_e_t_w_e_e_n _p_m._r_m__s_p _a_n_d _p_m._r_m__e_p> /* This call to regexec() finds the next match */ error = regexec (&re, pm.rm_ep, 1, &pm, REG_NOTBOL); } An application could use regerror(code,preg,NULL,(size_t)0) to find out how big a buffer is needed for the generated string, _m_a_l_l_o_c() a buffer to hold the string, and then call _r_e_g_e_r_r_o_r() again to get the string. Alternately, it could allocate a fixed, static buffer that is big enough to hold most strings (perhaps 128 bytes), and then _m_a_l_l_o_c() a larger buffer if it finds that this is too small. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.5 C Binding for Regular Expression Matching 931 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The _r_e_g_m_a_t_c_h() function must fill in all _n_m_a_t_c_h elements of _p_m_a_t_c_h, where 1 _n_m_a_t_c_h and _p_m_a_t_c_h are supplied by the application, even if some elements 1 of _p_m_a_t_c_h do not correspond to subexpressions in _p_a_t_t_e_r_n. The application 1 writer should note that there is probably no reason for using a value of 1 _n_m_a_t_c_h that is larger than _p_r_e_g->_r_e__n_s_u_b. 1 _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The REG_ICASE flag supports the operations taken by the grep -i option and the historical implementations of ex and vi. Including this flag will make it easier for application code to be written that does the same thing as these utilities. The substrings reported in _p_m_a_t_c_h[] are defined using offsets from the start of the string rather than pointers. Since this is a new interface, there should be no impact on historical implementations or applications, and offsets should be just as easy to use as pointers. The change to offsets was made to facilitate future extensions in which the string to be searched is presented to _r_e_g_e_x_e_c() in blocks, allowing a string to be searched that is not all in memory at once. A new type _r_e_g_o_f_f__t is used for the elements of _p_m_a_t_c_h[] to ensure that 1 the application can represent either the largest possible array in memory 1 (important for a POSIX.2-conforming application) or the largest possible 1 file (important for an application using the extension where a file is 1 searched in chunks). 1 The working group has rejected, at least for now, the inclusion of a _r_e_g_s_u_b() function that would be used to do substitutions for a matched regular expression. While such a routine would be useful to some applications, its utility would be much more limited than the matching function described here. Both regular expression parsing and substitution are possible to implement without support other than that required by the C Standard {7}, but matching is much more complex than substituting. The only ``difficult'' part of substitution, given the information supplied by _r_e_g_e_x_e_c(), is finding the next character in a string when there can be multibyte characters. That is a much wider issue, and one that needs a more general solution. The _e_r_r_n_o variable has not been used for error returns to avoid cluttering up the _e_r_r_n_o namespace for this feature. In Draft 9, the interface was modified so that the matched substrings _r_m__s_p and _r_m__e_p are in a separate _r_e_g_m_a_t_c_h__t structure instead of in _r_e_g_e_x__t. This allows a single compiled regular expression to be used simultaneously in several contexts; in _m_a_i_n() and a signal handler, perhaps, or in multiple threads of lightweight processes. (The _p_r_e_g argument to _r_e_g_e_x_e_c() is declared with type const, so the implementation is not permitted to use the structure to store intermediate results.) It Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 932 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 also allows an application to request an arbitrary number of substrings from a regular expression. (Previous versions reported only ten substrings.) The number of subexpressions in the regular expression is reported in _r_e__n_s_u_b in _p_r_e_g. With this change to _r_e_g_e_x_e_c(), consideration was given to dropping the REG_NOSUB flag, since the user can now specify this with a zero _n_m_a_t_c_h argument to _r_e_g_e_x_e_c(). However, keeping REG_NOSUB allows an implementation to use a different (perhaps more efficient) algorithm if it knows in _r_e_g_c_o_m_p() that no subexpressions need be reported. The implementation is only required to fill in _p_m_a_t_c_h if _n_m_a_t_c_h is not zero and if REG_NOSUB is not specified. Note that the _s_i_z_e__t type, as defined in the C Standard {7}, is unsigned, so the description of _r_e_g_e_x_e_c() does not need to address negative values of _n_m_a_t_c_h. The rules for reporting substrings of extended regular expressions are consistent with those used by Henry Spencer's ``almost public domain'' version of _r_e_g_e_x_e_c(). The REG_NOTBOL and REG_NOTEOL flags were added to _r_e_g_e_x_e_c() in Draft 9. REG_NOTBOL was added to allow an application to do repeated searches for the same pattern in a line. If the pattern contains a circumflex character that should match the beginning of a line, then the pattern should only match when matched against the beginning of the line. Without the REG_NOTBOL flag, the application could rewrite the expression for subsequent matches, but in the general case this would require parsing the expression. The need for REG_NOTEOL is not as clear; it was added for symmetry. The addition of the _r_e_g_e_r_r_o_r() function addresses the historical need for portable application programs to have access to error information more than ``Function failed to compile/match your regular expression for 1 unknown reasons.'' 1 This interface provides for two different methods of dealing with error conditions. The specific error codes (REG_EBRACE, for example), defined in , allow an application to recover from an error if it is so able. Many applications, especially those that use patterns supplied by a user, will not try to deal with specific error cases, but will just use _r_e_g_e_r_r_o_r() to obtain a human-readable error message to present to the user. The _r_e_g_e_r_r_o_r() function uses a scheme similar to _c_o_n_f_s_t_r() to deal with the problem of allocating memory to hold the generated string. The scheme used by _s_t_r_e_r_r_o_r() in the C Standard {7} was considered unacceptable since it creates difficulties for multithreaded applications. (POSIX.4a, a standard for threads, started balloting in 1 January 1991.) A different scheme used by _r_e_g_e_r_r_o_r() in one draft of 1 this standard was eliminated to improve internal consistency, and because the current interface produced greater consensus than the other. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.5 C Binding for Regular Expression Matching 933 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX The _p_r_e_g argument is provided to _r_e_g_e_r_r_o_r() to allow an implementation to generate a more descriptive message than would be possible with _e_r_r_c_o_d_e alone. An implementation might, for example, save the character offset of the offending character of the pattern in a field of _p_r_e_g, and then include that in the generated message string. The implementation may also ignore _p_r_e_g. A REG_FILENAME flag was considered, but omitted. This flag caused _r_e_g_e_x_e_c() to match patterns as described in 3.13 instead of regular expressions. This service is now provided by the _f_n_m_a_t_c_h() function [see B.6]. END_RATIONALE B.6 C Binding for Match Filename or Pathname Function: _f_n_m_a_t_c_h() B.6.1 Synopsis #include int fnmatch(const char *_p_a_t_t_e_r_n, const char *_s_t_r_i_n_g, int _f_l_a_g_s); B.6.2 Description The _f_n_m_a_t_c_h() function shall match patterns as described in 3.13.1 and 3.13.2. It checks the string specified by the _s_t_r_i_n_g argument to see if it matches the pattern specified by the _p_a_t_t_e_r_n argument. The _f_l_a_g_s argument modifies the interpretation of _p_a_t_t_e_r_n and _s_t_r_i_n_g. It is the bitwise inclusive OR of zero or more of the flags shown in Table B-11, which are defined in the header . If the FNM_PATHNAME flag is set in _f_l_a_g_s, then a slash character in _s_t_r_i_n_g shall be explicitly matched by a slash in _p_a_t_t_e_r_n; it shall not be matched by either the asterisk or question-mark special characters, nor by a bracket expression. If the FNM_PATHNAME flag is not set, the slash character shall be treated as an ordinary character. If FNM_NOESCAPE is not set in _f_l_a_g_s, a backslash character (\) in _p_a_t_t_e_r_n 1 followed by any other character shall match that second character in _s_t_r_i_n_g. In particular, '\\' shall match a backslash in _s_t_r_i_n_g. If 1 FNM_NOESCAPE is set, a backslash character shall be treated as an ordinary character. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 934 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table B-11 - _ffff_nnnn_mmmm_aaaa_tttt_cccc_hhhh() _ffff_llll_aaaa_gggg_ssss Argument __________________________________________________________________________________________________________________________________________________ _f_l_a_g_s Description _________________________________________________________________________ FNM_NOESCAPE Disable backslash escaping 1 FNM_PATHNAME Slash in _s_t_r_i_n_g only matches slash in _p_a_t_t_e_r_n FNM_PERIOD Leading period in _s_t_r_i_n_g must be exactly matched by period in _p_a_t_t_e_r_n __________________________________________________________________________________________________________________________________________________ If FNM_PERIOD is set in _f_l_a_g_s, then a leading period in _s_t_r_i_n_g shall match a period in _p_a_t_t_e_r_n as described by rule (2) in 3.13.2, where the 1 location of ``leading'' is indicated by the value of FNM_PATHNAME: 1 - If FNM_PATHNAME is set, a period is ``leading'' if it is the first character in _s_t_r_i_n_g or if it immediately follows a slash. - If FNM_PATHNAME is not set, a period is ``leading'' only if it is the first character of _s_t_r_i_n_g. If FNM_PERIOD is not set, then no special restrictions shall be placed on matching a period. B.6.3 Returns If _s_t_r_i_n_g matches the pattern specified by _p_a_t_t_e_r_n, then _f_n_m_a_t_c_h() shall return zero. If there is no match, _f_n_m_a_t_c_h() shall return FNM_NOMATCH, which shall be defined in the header . If an error occurs, _f_n_m_a_t_c_h() shall return another nonzero value. B.6.4 Errors This standard does not specify any error conditions that are required to be detected by the _f_n_m_a_t_c_h() function. Some errors may be detected under unspecified conditions. BEGIN_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.6 C Binding for Match Filename or Pathname 935 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.6.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The _f_n_m_a_t_c_h() function has two major uses. It could be used by an application or utility that needs to read a directory and apply a pattern against each entry. The find utility is an example of this. It can also be used by the pax utility to process its _p_a_t_t_e_r_n operands, or by applications that need to match strings in a similar manner. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This function replaces the REG_FILENAME flag of _r_e_g_c_o_m_p() in early drafts. It provides virtually the same functionality as the _r_e_g_c_o_m_p() and _r_e_g_e_x_e_c() functions using the REG_FILENAME and REG_FSLASH flags [the REG_FSLASH flag was proposed for _r_e_g_c_o_m_p(), and would have had the opposite effect from FMN_PATHNAME], but with a simpler interface and less overhead. The name _f_n_m_a_t_c_h() is intended to imply _f_i_l_e_n_a_m_e match, rather than _p_a_t_h_n_a_m_e match. The default action of this function is to match filenames, rather than pathnames, since it gives no special significance to the slash character. With the FNM_PATHNAME flag, _f_n_m_a_t_c_h() does match pathnames, but without tilde expansion, parameter expansion, or special treatment for period at the beginning of a filename. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 936 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.7 C Binding for Command Option Parsing Function: _g_e_t_o_p_t() B.7.1 Synopsis #include int getopt(int _a_r_g_c, char * const _a_r_g_v[], const char *_o_p_t_s_t_r_i_n_g); 1 extern char *optarg; extern int optind, opterr, optopt; B.7.2 Description The _g_e_t_o_p_t() function is a command-line parser that can be used by applications that follow Utility Syntax Guidelines 3, 4, 5, 6, 7, 9, and 10 in 2.10.2. The remaining guidelines are not addressed by _g_e_t_o_p_t() and are the responsibility of the application. The parameters _a_r_g_c and _a_r_g_v are the argument count and argument array as passed to _m_a_i_n(). The argument _o_p_t_s_t_r_i_n_g is a string of recognized option characters; if a character is followed by a colon, the option takes an argument. All option characters allowed by Utility Syntax Guideline 3 are allowed in _o_p_t_s_t_r_i_n_g. The implementation may accept other characters as an extension. The variable _o_p_t_i_n_d is the index of the next element of the _a_r_g_v[] vector to be processed. It is initialized to 1 by the system, and _g_e_t_o_p_t() updates it when it finishes with each element of _a_r_g_v[]. When an element of _a_r_g_v[] contains multiple option characters, it is unspecified how _g_e_t_o_p_t() determines which options have already been processed. The _g_e_t_o_p_t() function shall return the next option character from _a_r_g_v that matches a character in _o_p_t_s_t_r_i_n_g, if there is one that matches. If 1 the option takes an argument, _g_e_t_o_p_t() shall set the variable _o_p_t_a_r_g to point to the option-argument as follows: (1) If the option was the last character in the string pointed to by an element of _a_r_g_v, then _o_p_t_a_r_g contains the next element of _a_r_g_v, and _o_p_t_i_n_d shall be incremented by 2. If the resulting value of _o_p_t_i_n_d is not less than _a_r_g_c, this indicates a missing option argument, and _g_e_t_o_p_t() shall return an error indication. (2) Otherwise, _o_p_t_a_r_g points to the string following the option character in that element of _a_r_g_v, and _o_p_t_i_n_d shall be incremented by 1. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.7 C Binding for Command Option Parsing 937 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX If, when _g_e_t_o_p_t() is called, _a_r_g_v[_o_p_t_i_n_d] is NULL, *_a_r_g_v[_o_p_t_i_n_d] is not the character -, or _a_r_g_v[_o_p_t_i_n_d] points to the string "-", _g_e_t_o_p_t() shall return -1 without changing _o_p_t_i_n_d. If _a_r_g_v[_o_p_t_i_n_d] points to the string "--", _g_e_t_o_p_t() shall return -1 after incrementing _o_p_t_i_n_d. If _g_e_t_o_p_t() encounters an option character that is not contained in _o_p_t_s_t_r_i_n_g, it shall return the question-mark (?) character. If it detects a missing option argument, it shall return the colon character (:) if the first character of _o_p_t_s_t_r_i_n_g was a colon, or a question-mark character otherwise. In either case, _g_e_t_o_p_t() shall set the variable _o_p_t_o_p_t to the option character that caused the error. If the application has not set the variable _o_p_t_e_r_r to zero and the first character of _o_p_t_s_t_r_i_n_g is not a colon, _g_e_t_o_p_t() shall also print a diagnostic message to standard error using the formatting rules specified for the getopts 1 utility (see 4.27.6.2). 1 B.7.3 Returns The _g_e_t_o_p_t() function shall return the next option character specified on the command line. The value -1 shall be returned when all command line options have been parsed. B.7.4 Errors If an invalid option is encountered, _g_e_t_o_p_t() shall return a question- mark character. If an option with a missing option argument is encountered, _g_e_t_o_p_t() shall return either a question-mark or a colon, as described previously. BEGIN_RATIONALE B.7.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The _g_e_t_o_p_t() function is only required to support option characters included in Guideline 3. Many historical implementations of _g_e_t_o_p_t() support other characters as options. This is an allowed extension, but applications that use extensions are not maximally portable. Note that support for multibyte option characters is only possible when such characters can be represented as type _i_n_t. The code fragment in Figure B-4 shows how one might process the arguments for a utility that can take the mutually exclusive options a and b and the options f and o, both of which require arguments. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 938 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 _________________________________________________________________________ #include int main (int argc, char *argv[ ]) 1 { 1 int c, bflg, aflg, errflg = 0; 1 char *ifile, *ofile; 1 extern char *optarg; extern int optind, optopt; . . . while ((c = getopt(argc, argv, ":abf:o:")) != -1) { switch (c) { case 'a': if (bflg) errflg = 1; 1 else aflg = 1; 1 break; case 'b': if (aflg) errflg = 1; 1 else bflg = 1; 1 bproc( ); break; case 'f': ifile = optarg; break; case 'o': ofile = optarg; break; case ':': /* -f or -o without option-arg */ 1 fprintf (stderr, 1 "Option -%c requires an option-argument\n",1 optopt); 1 errflg = 1; 1 break; case '?': fprintf (stderr, "Unrecognized option: -%c\n", optopt); errflg = 1; 1 break; } } if (errflg) { fprintf(stderr, "usage: . . . "); exit(2); } for ( ; optind < argc; optind++) { Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.7 C Binding for Command Option Parsing 939 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX if (access(argv[optind], R_OK)) { . . . } _________________________________________________________________________ Figure B-4 - Argument Processing with _gggg_eeee_tttt_oooo_pppp_tttt() Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 940 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 The code in Figure B-4 accepts any of the following as equivalent: cmd -ao arg path path cmd -a -o arg path path cmd -o arg -a path path cmd -a -o arg -- path path cmd -a -oarg path path cmd -aoarg path path _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e Support for the _o_p_t_o_p_t variable was added in Draft 9. This documents historical practice, and allows the application to obtain the identity of the invalid option. The description was extensively rewritten in Draft 9 to be more explicit about how _o_p_t_a_r_g and _o_p_t_i_n_d are set, and to recognize that this routine deals with a vector of string pointers, not directly with a shell command line. The description was modified in Draft 9 to make it clear that _g_e_t_o_p_t(), like the getopts utility, shall deal with option-arguments whether separated from the option by _s or not. Note that the requirements on _g_e_t_o_p_t() and getopts are more stringent than the Utility Syntax Guidelines. The _g_e_t_o_p_t() function has been changed to return -1, rather than EOF, so that <_s_t_d_i_o._h> is not required. The special significance of a colon as the first character of _o_p_t_s_t_r_i_n_g 1 was added in Draft 11 to make _g_e_t_o_p_t() consistent with the getopts 1 utility. It allows an application to make a distinction between a 1 missing argument and an incorrect option letter without having to examine 1 the option letter. It is true that a missing argument can only be 1 detected in one case, but that is a case that has to be considered. 1 END_RATIONALE 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.7 C Binding for Command Option Parsing 941 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.8 C Binding for Generate Pathnames Matching a Pattern Functions: _g_l_o_b(), _g_l_o_b_f_r_e_e() B.8.1 Synopsis #include int glob(const char *_p_a_t_t_e_r_n, int _f_l_a_g_s, int (*_e_r_r_f_u_n_c)(const char *_e_p_a_t_h, int _e_e_r_r_n_o), glob_t *_p_g_l_o_b); void globfree(glob_t *_p_g_l_o_b); B.8.2 Description The _g_l_o_b() function is a pathname generator that implements the rules defined in 3.13, with optional support for rule (3) in 3.13.3. The header defines the structure type _g_l_o_b__t, which includes at least the members shown in Table B-12. Table B-12 - Structure Type _gggg_llll_oooo_bbbb______tttt __________________________________________________________________________________________________________________________________________________ Member Member Type Name Description _________________________________________________________________________ _s_i_z_e__t _g_l__p_a_t_h_c Count of paths matched by _p_a_t_t_e_r_n. 11 _c_h_a_r ** _g_l__p_a_t_h_v Pointer to a list of matched pathnames. _s_i_z_e__t _g_l__o_f_f_s Slots to reserve at the beginning of 11 _g_l__p_a_t_h_v. 1 __________________________________________________________________________________________________________________________________________________ The argument _p_a_t_t_e_r_n is a pointer to a pathname pattern to be expanded. The _g_l_o_b() function shall match all accessible pathnames against this pattern and develop a list of all pathnames that match. In order to have access to a pathname, _g_l_o_b() requires search permission on every component of a path except the last and read permission on each directory of any filename component of _p_a_t_t_e_r_n that contains any of the special characters *, ? or [. The _g_l_o_b() function stores the number of matched pathnames into _p_g_l_o_b->_g_l__p_a_t_h_c and a pointer to a list of pointers to pathnames into _p_g_l_o_b->_g_l__p_a_t_h_v. The pathnames are in sort order as defined by 2.2.2.30. The first pointer after the last pathname shall be NULL. If the pattern does not match any pathnames, the returned number of matched paths is set to zero. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 942 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 It is the caller's responsibility to create the structure pointed to by _p_g_l_o_b. The _g_l_o_b() function shall allocate other space as needed, including the memory pointed to by _g_l__p_a_t_h_v. The _g_l_o_b_f_r_e_e() function shall free any space associated with _p_g_l_o_b from a previous call to _g_l_o_b(). The argument _f_l_a_g_s is used to control the behavior of _g_l_o_b(). The value of _f_l_a_g_s is the bitwise inclusive OR of any of the constants shown in Table B-13, which are defined in . Table B-13 - _gggg_llll_oooo_bbbb() _ffff_llll_aaaa_gggg_ssss Argument __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ GLOB_APPEND Append pathnames generated to the ones from a previous call to _g_l_o_b(). GLOB_DOOFFS Make use of _p_g_l_o_b->_g_l__o_f_f_s. If this flag is set, _p_g_l_o_b->_g_l__o_f_f_s is used to specify how many NULL pointers to add to the beginning of _p_g_l_o_b->_g_l__p_a_t_h_v. In other words, _p_g_l_o_b->_g_l__p_a_t_h_v shall point to _p_g_l_o_b->_g_l__o_f_f_s NULL pointers, followed by _p_g_l_o_b->_g_l__p_a_t_h_c pathname pointers, followed by a NULL pointer. GLOB_ERR Causes _g_l_o_b() to return when it encounters a directory that it cannot open or read. Ordinarily, _g_l_o_b() continues to find matches. GLOB_MARK Each pathname that is a directory that matches _p_a_t_t_e_r_n has a slash appended. GLOB_NOCHECK Support rule (3) in 3.13.3. If _p_a_t_t_e_r_n does not match any pathname, then _g_l_o_b() shall return a list consisting of only _p_a_t_t_e_r_n, and the number of matched pathnames is 1. GLOB_NOESCAPE Disable backslash escaping. 1 GLOB_NOSORT Ordinarily, _g_l_o_b() sorts the matching pathnames according to the definition of _c_o_l_l_a_t_i_o_n _s_e_q_u_e_n_c_e in 2.2.2.30. When this flag is used the order of pathnames returned is unspecified. __________________________________________________________________________________________________________________________________________________ The GLOB_APPEND flag can be used to append a new set of words to those generated by a previous call to _g_l_o_b(). The following rules apply when 1 two or more calls to _g_l_o_b() are made with the same value of _p_g_l_o_b and 1 Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.8 C Binding for Generate Pathnames Matching a Pattern 943 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX without intervening calls to _g_l_o_b_f_r_e_e(): 1 (1) The first such call shall not set GLOB_APPEND. All subsequent 1 calls shall set it. 1 (2) All of the calls shall set GLOB_DOOFFS, or all shall not set it. 1 (3) After the second call, _p_g_l_o_b->_g_l__p_a_t_h_v shall point to a list containing the following: (a) Zero or more NULLs, as specified by GLOB_DOOFFS and _p_g_l_o_b->_g_l__o_f_f_s. (b) Pointers to the pathnames that were in the _p_g_l_o_b->_g_l__p_a_t_h_v list before the call, in the same order as before. (c) Pointers to the new pathnames generated by the second call, in the specified order. (4) The count returned in _p_g_l_o_b->_g_l__p_a_t_h_c shall be the total number of pathnames from the two calls. The application can change any of the fields in Table B-12 after a call 1 to _g_l_o_b(), but if it does it shall reset them to the original value 1 before a subsequent call, using the same _p_g_l_o_b value, to _g_l_o_b_f_r_e_e() or 1 _g_l_o_b() with the GLOB_APPEND flag. 1 If, during the search, a directory is encountered that cannot be opened or read and _e_r_r_f_u_n_c is not NULL, _g_l_o_b() shall call (*_e_r_r_f_u_n_c)() with two arguments: (1) The _e_p_a_t_h argument is a pointer to the path that failed. (2) The _e_e_r_r_n_o argument is the value of _e_r_r_n_o from the failure, as set by the POSIX.1 {8} _o_p_e_n_d_i_r(), _r_e_a_d_d_i_r(), or _s_t_a_t() functions. (Other values may be used to report other errors not explicitly documented for those functions.) If (*_e_r_r_f_u_n_c)() is called and returns nonzero, or if the GLOB_ERR flag is set in _f_l_a_g_s, _g_l_o_b() shall stop the scan and return GLOB_ABORTED after setting _g_l__p_a_t_h_c and _g_l__p_a_t_h_v in _p_g_l_o_b to reflect the paths already scanned. If GLOB_ERR is not set and either _e_r_r_f_u_n_c is NULL or (*_e_r_r_f_u_n_c)() returns zero, the error shall be ignored. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 944 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.8.3 Returns On successful completion, _g_l_o_b() shall return zero. The argument _p_g_l_o_b->_g_l__p_a_t_h_c shall return the number of matched pathnames and the argument _p_g_l_o_b->_g_l__p_a_t_h_v shall contain a pointer to a null-terminated list of matched and sorted pathnames. However, if _p_g_l_o_b->_g_l__p_a_t_h_c is zero, the content of _p_g_l_o_b->_g_l__p_a_t_h_v is undefined. Table B-14 - _gggg_llll_oooo_bbbb() Error Return Values __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ GLOB_ABORTED The scan was stopped because GLOB_ERR was set or (*_e_r_r_f_u_n_c)() returned nonzero. GLOB_NOMATCH The _p_a_t_t_e_r_n does not match any exiting 11 pathname, and GLOB_NOCHECK was not set in 1 _f_l_a_g_s. 1 GLOB_NOSPACE An attempt to allocate memory failed. __________________________________________________________________________________________________________________________________________________ B.8.4 Errors If _g_l_o_b() terminates due to an error, it shall return one of the nonzero constants shown in Table B-14, which are defined in . The arguments _p_g_l_o_b->_g_l__p_a_t_h_c and _p_g_l_o_b->_g_l__p_a_t_h_v are still set as defined above in Returns. BEGIN_RATIONALE B.8.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This function is not provided for the purpose of enabling utilities to perform pathname expansion on their arguments, as this operation is performed by the shell, and utilities are explicitly not expected to redo this. Instead, it is provided for applications that need to do pathname expansion on strings obtained from other sources, such as a pattern typed by a user or read from a file. If a utility needs to see if a pathname matches a given pattern, it can use _f_n_m_a_t_c_h(). Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.8 C Binding for Generate Pathnames Matching a Pattern 945 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Note that _g_l__p_a_t_h_c and _g_l__p_a_t_h_v have meaning even if _g_l_o_b() fails. This allows _g_l_o_b() to report partial results in the event of an error. However, if _g_l__p_a_t_h_c is zero, _g_l__p_a_t_h_v is unspecified even if _g_l_o_b() did not return an error. The GLOB_NOCHECK option could be used when an application wants to expand a pathname if wildcards are specified, but wants to treat the pattern as just a string otherwise. The sh utility might use this for option- arguments, for example. One use of the GLOB_DOOFFS flag is by applications that build an argument list for use with the POSIX.1 {8} _e_x_e_c_v(), _e_x_e_c_v_e(), or _e_x_e_c_v_p() functions. Suppose, for example, that an application wants to do the equivalent of ls -l *.c, but for some reason system("ls -l *.c") is not acceptable. The application could obtain (_a_p_p_r_o_x_i_m_a_t_e_l_y) the same result using the sequence: globbuf.gl_offs = 2; glob ("*.c", GLOB_DOOFFS, NULL, &globbuf); globbuf.gl_pathv[0] = "ls"; globbuf.gl_pathv[1] = "-l"; execvp ("ls", &globbuf.gl_pathv[0]); Using the same example, ls -l *.c *.h could be approximately simulated using GLOB_APPEND as follows: globbuf.gl_offs = 2; glob ("*.c", GLOB_DOOFFS, NULL, &globbuf); glob ("*.h", GLOB_DOOFFS|GLOB_APPEND, NULL, &globbuf); ... etc. ... The new pathnames generated by a subsequent call with GLOB_APPEND are not sorted together with the previous pathnames. This mirrors the way that the shell handles pathname expansion when multiple expansions are done on a command line. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The interface was simplified to a useful, but less complex, subset. The _e_r_r_f_u_n_c argument was added to allow errors to be reported. A reviewer claimed that the GLOB_DOOFFS flag is unnecessary because it could be simulated using: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 946 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 new = (char **)malloc((n + pglob->gl_pathc + 1) * sizeof (char *)); (void) memcpy (new+n, pglob->gl_pathv, pglob->gl_pathc * sizeof(char *)); (void) memset (new, 0, n * sizeof (char *)); free (pglob->gl_pathv); pglob->gl_pathv = new; However, this assumes that the memory pointed to by _g_l__p_a_t_h_v is a block that was separately created using _m_a_l_l_o_c(). This is not necessarily the case. An application should make no assumptions about how the memory referenced by fields in _p_g_l_o_b was allocated. It might have been obtained from _m_a_l_l_o_c() in a large chunk, and then carved up within _g_l_o_b(), or it might have been created using a different memory allocator. It is not the intent of this standard to specify or imply how the memory used by _g_l_o_b() is managed. The structure elements _g_l__p_a_t_h_c and _g_l__p_a_t_h_v were renamed from _g_l__a_r_g_c and _g_l__a_r_g_v in Draft 9. The old names implied an association with the parameters to _m_a_i_n() that does not necessarily exist. The GLOB_APPEND flag was added in Draft 9 at the request of a reviewer. This flag would be used when an application wants to expand several different patterns into a single list. Tilde and parameter expansion were removed from _g_l_o_b() in Draft 9. Applications that need these expansions should use the _w_o_r_d_e_x_p() function [see B.9]. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.8 C Binding for Generate Pathnames Matching a Pattern 947 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.9 C Binding for Perform Word Expansions Functions: _w_o_r_d_e_x_p(), _w_o_r_d_f_r_e_e() B.9.1 Synopsis #include int wordexp(const char *_w_o_r_d_s, wordexp_t *_p_w_o_r_d_e_x_p, int _f_l_a_g_s); void wordfree(wordexp_t *_p_w_o_r_d_e_x_p); B.9.2 Description The _w_o_r_d_e_x_p() function shall perform word expansions as described in 3.6, subject to quoting as in 3.2, and place the list of expanded words into _p_w_o_r_d_e_x_p. The expansions shall be the same as would be performed by the shell if _w_o_r_d_s were the part of a command line representing the arguments to a utility. Therefore, _w_o_r_d_s shall not contain an unquoted or any of the unquoted shell special characters |, &, ;, <, or >, except in the context of command substitution as specified in 3.6.3. It also shall not contain unquoted parentheses or braces, except in the context of command or variable substitution. If _w_o_r_d_s contains an unquoted comment character (number sign) that is the beginning of a token, _w_o_r_d_e_x_p() may treat the comment character as a regular character, or may interpret it as a comment indicator and ignore the remainder of _w_o_r_d_s. The header defines the structure type _w_o_r_d_e_x_p__t, which includes at least the members shown in Table B-15. Table B-15 - Structure Type _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp______tttt __________________________________________________________________________________________________________________________________________________ Member Member Type Name Description _________________________________________________________________________ _s_i_z_e__t _w_e__w_o_r_d_c Count of words matched by _w_o_r_d_s. 11 _c_h_a_r ** _w_e__w_o_r_d_v Pointer to list of expanded words. _s_i_z_e__t _w_e__o_f_f_s Slots to reserve at the beginning of 11 _w_e__w_o_r_d_v. 1 __________________________________________________________________________________________________________________________________________________ The argument _w_o_r_d_s is a pointer to a string containing one or more words to be expanded. The _w_o_r_d_e_x_p() function shall store the number of generated words into _w_e__w_o_r_d_c and a pointer to a list of pointers to Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 948 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 words in _w_e__w_o_r_d_v. Each individual field created during field splitting (see 3.6.5) or pathname expansion (see 3.6.6) is a separate word in the _w_e__w_o_r_d_v list. The words are in order as described in 3.6. The first pointer after the last word pointer shall be NULL. The expansion of special parameters described in 3.5.2 is unspecified. It is the caller's responsibility to create the structure pointed to by _p_w_o_r_d_e_x_p. The _w_o_r_d_e_x_p() function allocates other space as needed, including memory pointed to by _w_e__w_o_r_d_v. The _w_o_r_d_f_r_e_e() function shall free any memory associated with _p_w_o_r_d_e_x_p from a previous call to _w_o_r_d_e_x_p(). The argument _f_l_a_g_s is used to control the behavior of _w_o_r_d_e_x_p(). The value of _f_l_a_g_s is the bitwise inclusive OR of any of the constants in Table B-16, which are defined in . Table B-16 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() _ffff_llll_aaaa_gggg_ssss Argument __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ WRDE_APPEND Append words generated to the ones from a previous call to _w_o_r_d_e_x_p(). WRDE_DOOFFS Make use of _w_e__o_f_f_s. If this flag is set, _w_e__o_f_f_s is used to specify how many NULL pointers to add to the beginning of _w_e__w_o_r_d_v. In other words, _w_e__w_o_r_d_v shall point to _w_e__o_f_f_s NULL pointers, followed by _w_e__w_o_r_d_c word pointers, followed by a NULL pointer. WRDE_NOCMD Fail if command substitution, as specified in 3.6.3, is requested. WRDE_REUSE The _p_w_o_r_d_e_x_p argument was passed to a previous successful call to _w_o_r_d_e_x_p(), and has not been passed to _w_o_r_d_f_r_e_e(). The result shall be the same as if the application had called _w_o_r_d_f_r_e_e() and then called _w_o_r_d_e_x_p() without WRDE_REUSE. WRDE_SHOWERR Do not redirect standard error to /dev/null. WRDE_UNDEF Report error on an attempt to expand an undefined shell variable. __________________________________________________________________________________________________________________________________________________ The WRDE_APPEND flag can be used to append a new set of words to those generated by a previous call to _w_o_r_d_e_x_p(). The following rules apply when two or more calls to _w_o_r_d_e_x_p() are made with the same value of Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.9 C Binding for Perform Word Expansions 949 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _p_w_o_r_d_e_x_p and without intervening calls to _w_o_r_d_f_r_e_e(): (1) The first such call shall not set WRDE_APPEND. All subsequent calls shall set it. (2) All of the calls shall set WRDE_DOOFFS, or all shall not set it. (3) After the second and each subsequent call, _w_e__w_o_r_d_v shall point to a list containing the following: (a) Zero or more NULLs, as specified by WRDE_DOOFFS and _w_e__o_f_f_s. (b) Pointers to the words that were in the _w_e__w_o_r_d_v list before the call, in the same order as before. (c) Pointers to the new words generated by the latest call, in the specified order. (4) The count returned in _w_e__w_o_r_d_c shall be the total number of words from all of the calls. The application can change any of the fields in Table B-15 after a call 1 to _w_o_r_d_e_x_p(), but if it does it shall reset them to the original value 1 before a subsequent call, using the same _p_w_o_r_d_e_x_p value, to _w_o_r_d_f_r_e_e() or 1 _w_o_r_d_e_x_p() with the WRDE_APPEND or WRDE_REUSE flag. 1 If _w_o_r_d_s contains an unquoted , |, &, ;, <, >, parenthesis, or brace in an inappropriate context, _w_o_r_d_e_x_p() shall fail, and the number of expanded words shall be zero. Unless WRDE_SHOWERR is set in _f_l_a_g_s, _w_o_r_d_e_x_p() shall redirect standard error to /dev/null for any utilities executed as a result of command substitution while expanding _w_o_r_d_s. If WRDE_SHOWERR is set, _w_o_r_d_e_x_p() may write messages to standard error if syntax errors are detected while expanding _w_o_r_d_s. If WRDE_DOOFFS is set, then _w_e__o_f_f_s shall have the same value for each 1 _w_o_r_d_e_x_p() call and the _w_o_r_d_f_r_e_e() call using a given _p_g_l_o_b. 1 B.9.3 Returns If no errors are encountered while expanding _w_o_r_d_s, _w_o_r_d_e_x_p() shall return zero. Otherwise it shall return a nonzero value. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 950 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.9.4 Errors Table B-17 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() Return Values __________________________________________________________________________________________________________________________________________________ Name Description _________________________________________________________________________ WRDE_BADCHAR One of the unquoted characters |, &, ;, <, >, parentheses, or braces appears in _w_o_r_d_s in an inappropriate context. WRDE_BADVAL Reference to undefined shell variable when WRDE_UNDEF is set in _f_l_a_g_s. WRDE_CMDSUB Command substitution requested when WRDE_NOCMD was set in flags. WRDE_NOSPACE Attempt to allocate memory failed WRDE_SYNTAX Shell syntax error, such as unbalanced parentheses or unterminated string. __________________________________________________________________________________________________________________________________________________ If _w_o_r_d_e_x_p() terminates due to an error, it shall return one of the nonzero constants shown in Table B-17, which shall be defined in . The implementation may define additional error returns beginning with WRDE_. If _w_o_r_d_e_x_p() returns the error value WRDE_NOSPACE, then _p_w_o_r_d_e_x_p->_w_e__w_o_r_d_c and _p_w_o_r_d_e_x_p->_w_e__w_o_r_d_v shall be updated to reflect any words that were successfully expanded. In other cases, they shall not be modified. BEGIN_RATIONALE B.9.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e This function is intended to be used by an application that wants to do all of the shell's expansions on a word or words obtained from a user. For example, if the application prompts for a file name (or list of file names) and then used _w_o_r_d_e_x_p() to process the input, the user could respond with anything that would be valid as input to the shell. The WRDE_NOCMD flag is provided for applications that, for security or other reasons, want to prevent a user from executing shell commands. Disallowing unquoted shell special characters also prevents unwanted side effects such as executing a command or writing a file. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.9 C Binding for Perform Word Expansions 951 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This function was added in Draft 9 as an alternative to _g_l_o_b(). There has been continuing controversy over exactly what features should be included in _g_l_o_b(). It is hoped that providing _w_o_r_d_e_x_p() (which provides all of the shell's word expansions, but will probably be slow to execute), and _g_l_o_b() (which is faster but does only expansion of pathnames, without tilde or parameter expansion), will satisfy the majority of reviewers. While _w_o_r_d_e_x_p() could be implemented entirely as a library routine, it is 1 expected that most implementations will run a shell in a subprocess to do the expansion. Two different approaches have been proposed for how the required information might be presented to the shell and the results returned. They are presented here as examples. One proposal is to extend the echo utility by adding a -q option. This option would cause echo to add a backslash before each backslash and each that occurs within an argument. The _w_o_r_d_e_x_p() function could then invoke the shell as follows: (void) strcpy (buffer, "echo -q "); (void) strcat (buffer, _w_o_r_d_s); if ((flags & WRDE_SHOWERR) == 0) (void) strcat (buffer, " 2>/dev/null"); f = popen (buffer, "r"); The _w_o_r_d_e_x_p() function would read the resulting output, remove unquoted backslashes, and break into words at unquoted _s. If the WRDE_NOCMD flag was set, _w_o_r_d_e_x_p() would have to scan _w_o_r_d_s before starting the subshell to make sure that there would be no command substitution. In any case, it would have to scan _w_o_r_d_s for unquoted special characters. Another proposal is to add the following options to sh: -w _w_o_r_d_l_i_s_t This option provides a wordlist expansion service to applications. The words in _w_o_r_d_l_i_s_t are expanded, and the following is written to standard output: (1) The count of the number of words after expansion, in decimal, followed by a null byte. (2) The number of bytes needed to represent the expanded words (not including null separators), in decimal, followed by a null byte. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 952 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 (3) The expanded words, each terminated by a null byte. If an error is encountered during word expansion, sh exits with a nonzero status after writing the above to report any words successfully expanded -P Run in ``protected'' mode. If specified with the -w option, no command substitution is performed. With these options, _w_o_r_d_e_x_p() could be implemented fairly simply by creating a subprocess using _f_o_r_k(), and executing sh using the line: execl(<_s_h_e_l_l _p_a_t_h>, "_s_h", "-_P", "-_w", _w_o_r_d_s, (_c_h_a_r *)_0); after directing standard error to /dev/null. It seemed objectionable for a library routine to write messages to standard error, unless explicitly requested, so _w_o_r_d_e_x_p() is required to redirect standard error to /dev/null to ensure that no messages are generated, even for commands executed for command substitution. The new WRDE_SHOWERR flag can be specified to request that error messages be written. The WRDE_REUSE flag allows the implementation to avoid the expense of freeing and reallocating memory, if that is possible. A minimal implementation can just call _w_o_r_d_f_r_e_e() when WRDE_REUSE is set. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.9 C Binding for Perform Word Expansions 953 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX B.10 C Binding for Get POSIX Configurable Variables B.10.1 C Binding for Get String-Valued Configurable Variables Function: _c_o_n_f_s_t_r() B.10.1.1 Synopsis #include size_t confstr(int _n_a_m_e, char *_b_u_f, size_t _l_e_n); B.10.1.2 Description The _c_o_n_f_s_t_r() function provides a method for applications to get configuration-defined string values. Its use and purpose are similar to the _s_y_s_c_o_n_f() function defined in POSIX.1 {8}, but it is used where string values rather than numeric values are returned. The _n_a_m_e argument represents the system variable to be queried. The implementation shall support all of the _n_a_m_e values shown in Table B-18, which are defined in . It may support others. Table B-18 - confstr() _nnnn_aaaa_mmmm_eeee Values __________________________________________________________________________________________________________________________________________________ _nnnn_aaaa_mmmm_eeee Value String returned by confstr() _________________________________________________________________________ _CS_PATH A value for the PATH environment variable that finds all standard utilities. __________________________________________________________________________________________________________________________________________________ If _l_e_n is not zero, and if _n_a_m_e has a configuration-defined value, _c_o_n_f_s_t_r() shall copy that value into the _l_e_n-byte buffer pointed to by _b_u_f. If the string to be returned is longer than _l_e_n bytes, including the terminating null, then _c_o_n_f_s_t_r() shall truncate the string to _l_e_n-1 bytes and null-terminate the result. The application can detect that the string was truncated by comparing the value returned by _c_o_n_f_s_t_r() with _l_e_n. If _l_e_n is zero and _b_u_f is NULL, then _c_o_n_f_s_t_r() still shall return the integer value as defined below, but shall not return a string. If _l_e_n is zero but _b_u_f is not NULL, the result is unspecified. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 954 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 B.10.1.3 Returns If _n_a_m_e does not have a configuration-defined value, _c_o_n_f_s_t_r() shall return zero and leave _e_r_r_n_o unchanged. If _n_a_m_e has a configuration-defined value, the _c_o_n_f_s_t_r() function shall return the size of buffer that would be needed to hold the entire configuration-defined value. If this return value is greater than _l_e_n, the string returned in _b_u_f has been truncated. B.10.1.4 Errors If any of the following conditions occur, _c_o_n_f_s_t_r() shall return zero and set _e_r_r_n_o to the corresponding value: [EINVAL] The value of the _n_a_m_e argument is invalid. BEGIN_RATIONALE B.10.1.5 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e An application can distinguish between an invalid _n_a_m_e parameter value and one that corresponds to a configurable variable that has no configuration-defined value by checking if _e_r_r_n_o has been modified. This mirrors the behavior of _s_y_s_c_o_n_f() in POSIX.1 {8}. The original need for this function was to provide a way of finding the configuration-defined default value for the environment variable PATH. Since PATH can be modified by the user to include directories that could contain utilities replacing POSIX.2 standard utilities, applications need a way to determine the system-supplied PATH environment variable value that contains the correct search path for the POSIX.2 standard utilities. An application could use confstr(name,NULL,(size_t) 0) to find out how big a buffer is needed for the string value, _m_a_l_l_o_c() a buffer to hold the string, and call _c_o_n_f_s_t_r() again to get the string. Alternately, it could allocate a fixed, static buffer that is big enough to hold most answers (512 bytes, maybe, or 1024), but then _m_a_l_l_o_c() a larger buffer if it finds that this is too small. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e In Draft 7, these values and _s_y_s_c_o_n_f() values defined in POSIX.1 {8} were obtained using a function named _p_o_s_i_x_c_o_n_f(). However, that routine was dropped in favor of _c_s_y_s_c_o_n_f(). There did not seem to be any reason to provide the redundant interface to POSIX.1 {8} functions, nor to return values as strings when numeric values are really what are needed. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.10 C Binding for Get POSIX Configurable Variables 955 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _c_s_y_s_c_o_n_f() could be extended to return strings for other related standards or features. In Draft 9, _c_s_y_s_c_o_n_f() has been replaced by _c_o_n_f_s_t_r(). The name was changed because too many people were confused by the name; they thought that the `c' referred to the C language, rather than characters (as distinct from integers). The _c_o_n_f_s_t_r() function also copies the returned string into a buffer supplied by the application instead of returning a pointer to a string. This allows a cleaner interface in some implementations (lightweight processes were mentioned), and resolves questions about when the application must copy the string returned. END_RATIONALE B.10.2 C Binding for Get Numeric-Valued Configurable Variables Functions: _s_y_s_c_o_n_f(), _p_a_t_h_c_o_n_f(), _f_p_a_t_h_c_o_n_f() A system that supports the C Language Bindings Option shall support the C language bindings defined in POSIX.1 {8} for the _s_y_s_c_o_n_f(), _p_a_t_h_c_o_n_f(), and _f_p_a_t_h_c_o_n_f() functions. Of the _n_a_m_e values defined in POSIX.1 {8}, only those that correspond to numeric-valued configuration values listed in Table 7-1, are required by POSIX.2. In addition, the _s_y_s_c_o_n_f() function shall support the _n_a_m_e values in Table B-19, defined in , to provide values for values in 2.13.1. BEGIN_RATIONALE B.10.3 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) In Draft 9, the _n_a_m_e values corresponding to the _POSIX2_* symbolic limits were changed to more closely follow the convention used in POSIX.1 {8}. In POSIX.1 {8}, for example, the _n_a_m_e value for {_POSIX_VERSION} is _SC_VERSION. The POSIX.2 _n_a_m_e value for {_POSIX2_C_DEV} (actually, it was {_POSIX_C_DEV} in Draft 8) was _SC_POSIX_C_DEV, and is now _SC_2_C_DEV. If sysconf(_SC_2_VERSION) is not equal to the value of the {_POSIX2_VERSION} symbolic constant (see B.2.2), the utilities available via _s_y_s_t_e_m() or _p_o_p_e_n() might not behave as described in this standard. This would mean that the application is not running in an environment that conforms to POSIX.2. Some applications might be able to deal with this, others might not. However, the interfaces defined in Annex B shall continue to operate as specified, even if sysconf(_SC_2_VERSION) reports that the utilities no longer perform as specified. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 956 B C Language Bindings Option Part 2: SHELL AND UTILITIES P1003.2/D11.2 Table B-19 - C Bindings for Numeric-Valued Configurable Variables __________________________________________________________________________________________________________________________________________________ Symbolic Limit _n_a_m_e Value _________________________________________ {BC_BASE_MAX} _SC_BC_BASE_MAX {BC_DIM_MAX} _SC_BC_DIM_MAX {BC_SCALE_MAX} _SC_BC_SCALE_MAX {BC_STRING_MAX} _SC_BC_STRING_MAX {COLL_WEIGHTS_MAX} _SC_COLL_WEIGHTS_MAX {EXPR_NEST_MAX} _SC_EXPR_NEST_MAX {LINE_MAX} _SC_LINE_MAX {RE_DUP_MAX} _SC_RE_DUP_MAX {POSIX2_VERSION} _SC_2_VERSION {POSIX2_C_DEV} _SC_2_C_DEV {POSIX2_FORT_DEV} _SC_2_FORT_DEV {POSIX2_FORT_RUN} _SC_2_FORT_RUN {POSIX2_LOCALEDEF} _SC_2_LOCALEDEF {POSIX2_SW_DEV} _SC_2_SW_DEV __________________________________________________________________________________________________________________________________________________ END_RATIONALE B.11 C Binding for Locale Control The C binding to the services described in 7.9 shall be the _s_e_t_l_o_c_a_l_e() function defined in POSIX.1 {8} 8.1.2. In addition to the category values defined in POSIX.1 {8}, _s_e_t_l_o_c_a_l_e() shall also accept the value LC_MESSAGES, which shall be defined in . BEGIN_RATIONALE B.11.1 C Binding for Locale Control Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The order in which the various locale categories are processed by _s_e_t_l_o_c_a_l_e() is not specified by POSIX.1 {8}, so the place for LC_MESSAGES in that order is also unspecified. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. B.11 C Binding for Locale Control 957 P1003.2/D11.2 Annex C (normative) FORTRAN Development and Runtime Utilities Options This annex describes utilities used for the development of FORTRAN language applications, including compilation or translation of FORTRAN source code, and the execution of certain FORTRAN applications at runtime. The utilities described in this annex may be provided by the conforming system; however, any system claiming conformance to the FORTRAN Development Utilities Option shall provide the fort77 utility and any system claiming conformance to the FORTRAN Runtime Utilities Option shall provide the asa utility. BEGIN_RATIONALE C.0.1 FORTRAN Development and Runtime Utilities Options Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) This clause is included in this standard as a temporary measure to accommodate existing FORTRAN developers. It is the intention of the POSIX.2 working group that this annex be moved from this standard to the emerging standard being developed by the POSIX.9 working group, which will specify FORTRAN-specific interfaces to the basic services provided by this standard and POSIX.1. The movement of this annex should occur in a later version of this standard. See the rationale for asa for a description of the FORTRAN Runtime Utilities Option and why it was split off from the FORTRAN Development Utilities Option. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex C FORTRAN Development and Runtime Utilities Options 959 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C.1 asa - Interpret carriage-control characters This utility is optional. It shall be provided on systems that support the FORTRAN Runtime Utilities Option. C.1.1 Synopsis asa [_f_i_l_e ...] C.1.2 Description The asa utility shall write its input files to standard output, mapping carriage-control characters from the text files to line-printer control sequences in an implementation-defined manner. The first character of every line shall be removed from the input, and the following actions shall be performed: If the character removed is: The rest of the line shall be output without change. 0 A shall be output, then the rest of the input line. 1 One or more implementation-defined characters that causes an advance to the next page shall be output, followed by the rest of the input line. + The of the previous line shall be replaced with one or more implementation-defined characters that causes printing to return to column position 1, followed by the rest of the input line. If the + is the first character in the input, it shall have the same effect as . The action of the asa utility is unspecified upon encountering any character other than those listed above as the first character in a line. C.1.3 Options None. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 960 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 C.1.4 Operands _f_i_l_e A pathname of a text file used for input. If no _f_i_l_e operands are specified, the standard input shall be used. C.1.5 External Influences C.1.5.1 Standard Input The standard input shall be used only if no _f_i_l_e operands are specified. See Input Files. C.1.5.2 Input Files The input files shall be text files. C.1.5.3 Environment Variables The following environment variables shall affect the execution of asa: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. C.1.5.4 Asynchronous Events Default. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.1 asa - Interpret carriage-control characters 961 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C.1.6 External Effects C.1.6.1 Standard Output The standard output shall be the text from the input file modified as described in C.1.2. C.1.6.2 Standard Error None. C.1.6.3 Output Files None. C.1.7 Extended Description None. C.1.8 Exit Status The asa utility shall exit with one of the following values: 0 All input files were output successfully. >0 An error occurred. C.1.9 Consequences of Errors Default. BEGIN_RATIONALE C.1.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The asa utility is needed to map ``standard'' FORTRAN 77 output into a form acceptable to contemporary printers. Usually asa is used to pipe data to the lp utility (see lp in 4.38.) The following command: asa file Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 962 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 permits the viewing of file (created by a program using FORTRAN-style carriage control characters) on a terminal. The following command: a.out | asa | lp formats the FORTRAN output of a.out and directs it to the printer. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e This utility is generally used only by FORTRAN programs. It was moved to this annex in response to multiple ballot objections requesting its removal. The working group decided to retain asa to avoid breaking the existing large base of FORTRAN applications that put carriage control characters in their output files. This is a compromise position to achieve balloting acceptance: the overhead of maintaining a separate option in POSIX.2 for just this one utility is seen to be small in comparison to the benefit achieved for FORTRAN applications. Since it is a separate option, there is no requirement that a system have a FORTRAN compiler in order to run applications that need asa. Historical implementations have used an ASCII character in response to a '1', and an ASCII in response to a '+'. It is suggested that implementations treat characters other than '0', '1', and '+' as in the absence of any compelling reason to do otherwise. However, the action is listed here as ``unspecified,'' permitting an implementation to provide extensions to access fast multiple line slewing and channel seeking in a nonportable manner. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.1 asa - Interpret carriage-control characters 963 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C.2 fort77 - FORTRAN compiler This utility is optional. It shall be provided on systems that support the FORTRAN Development Utilities Option. C.2.1 Synopsis fort77 [-c] [-g] [-L _d_i_r_e_c_t_o_r_y] ... [-O _o_p_t_l_e_v_e_l] [-o _o_u_t_f_i_l_e] [-s] [-w] _o_p_e_r_a_n_d ... C.2.2 Description The fort77 utility is the interface to the FORTRAN compilation system; it shall accept the full FORTRAN language defined by ISO 1539 {2}. The system conceptually consists of a compiler and link editor. The files referenced by _o_p_e_r_a_n_ds are compiled and linked to produce an executable file. (It is unspecified whether the linking occurs entirely within the operation of fort77; some systems may produce objects that are not fully resolved until the file is executed.) If the -c option is present, for all pathname operands of the form _f_i_l_e.f, the files $(basename _p_a_t_h_n_a_m_e ._f)._o shall be created or overwritten as the result of successful compilation. If the -c option is not specified, it is unspecified whether such .o files are created or deleted for the _f_i_l_e.f operands. If there are no options that prevent link editing (such as -c) and all operands compile and link without error, the resulting executable file shall be written into the file named by the -o option (if present) or to the file a.out. The executable file shall be created as specified in 2.9.1.4, except that the file permissions shall be set to S_IRWXO | S_IRWXG | S_IRWXU (see POSIX.1 {8} 5.6.1.2) and that the bits specified by the _u_m_a_s_k of the process shall be cleared. C.2.3 Options The fort77 utility shall conform to the utility argument syntax guidelines described in 2.10.2, except that: Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 964 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 - The -l _l_i_b_r_a_r_y operands have the format of options, but their position within a list of operands affects the order in which libraries are searched. - The order of specifying the multiple -L options is significant. - Conforming applications shall specify each option separately; that is, grouping option letters (e.g., -cg) need not be recognized by all implementations. The following options shall be supported by the implementation: -c Suppress the link-edit phase of the compilation, and do not remove any object files that are produced. -g Produce symbolic information in the object or executable files; the nature of this information is unspecified, and may be modified by implementation-defined interactions with other options. -s Produce object and/or executable files from which symbolic and other information not required for proper execution using the POSIX.1 {8} _e_x_e_c family has been removed (stripped). If both -g and -s options are present, the action taken is unspecified. -o _o_u_t_f_i_l_e Use the pathname _o_u_t_f_i_l_e, instead of the default a.out, for the executable file produced. If the -o option is present with -c, the result is unspecified. -L _d_i_r_e_c_t_o_r_y Change the algorithm of searching for the libraries named in -l operands to look in the directory named by the _d_i_r_e_c_t_o_r_y pathname before looking in the usual places. Directories named in -L options shall be searched in the specified order. Implementations shall support at least ten instances of this option in a single fort77 command invocation. If a directory specified by a -L option contains a file named libf.a, the results are unspecified. -O _o_p_t_l_e_v_e_l Specify the level of code optimization. If the _o_p_t_l_e_v_e_l option-argument is the digit 0, all special code optimizations shall be disabled. If it is the digit 1, the nature of the optimization is unspecified. If the -O option is omitted, the nature of the system's default optimization is unspecified. It is unspecified whether code generated in the presence of the -O 0 option is the same as that generated when -O is omitted. Other _o_p_t_l_e_v_e_l values may be supported. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.2 fort77 - FORTRAN compiler 965 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX -w Suppress warnings. Multiple instances of -L options can be specified. C.2.4 Operands An _o_p_e_r_a_n_d is either in the form of a pathname or the form -l _l_i_b_r_a_r_y. At least one operand of the pathname form shall be specified. The following operands shall be supported by the implementation: _f_i_l_e._f The pathname of a FORTRAN source file to be compiled and optionally passed to the link editor. The file name operand shall be of this form if the -c option is used. _f_i_l_e._a A library of object files typically produced by ar (see 6.1), and passed directly to the link editor. Implementations may recognize implementation-defined suffixes other than .a as denoting object file libraries. _f_i_l_e._o An object file produced by fort77 -c, and passed directly to the link editor. Implementations may recognize implementation-defined suffixes other than .o as denoting object files. The processing of other files is implementation defined. -l _l_i_b_r_a_r_y (The letter ell.) Search the library named: lib_l_i_b_r_a_r_y._a A library is searched when its name is encountered, so the placement of a -l operand is significant. Several standard libraries can be specified in this manner, as described in C.2.7. Implementations may recognize implementation-defined suffixes other than .a as denoting libraries. C.2.5 External Influences C.2.5.1 Standard Input None. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 966 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 C.2.5.2 Input Files The input file shall be one of the following: a text file containing FORTRAN source code; an object file in the format produced by fort77 -c; or a library of object files, in the format produced by archiving zero or more object files, using ar. Implementations may supply additional utilities that produce files in these formats. Additional input files are implementation defined. A character encountered within the first six characters on a line of source code shall cause the compiler to interpret the following character as if it were the seventh character on the line (i.e., in column 7). C.2.5.3 Environment Variables The following environment variables shall affect the execution of fort77: LANG This variable shall determine the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. See 2.6. LC_ALL This variable shall determine the locale to be used to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_CTYPE This variable shall determine the locale for the interpretation of sequences of bytes of text data as characters (e.g., single- versus multibyte characters in arguments and input files). LC_MESSAGES This variable shall determine the language in which messages should be written. TMPDIR This variable shall be interpreted as a pathname that should override the default directory for temporary files, if any. C.2.5.4 Asynchronous Events Default. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.2 fort77 - FORTRAN compiler 967 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX C.2.6 External Effects C.2.6.1 Standard Output None. C.2.6.2 Standard Error Used only for diagnostic messages. If more than one file operand ending in .f (or possibly other unspecified suffixes) is given, for each such file: "%s:\n", <_f_i_l_e> may be written to allow identification of the diagnostic message with the appropriate input file. This utility may produce warning messages about certain conditions that do not warrant returning an error (nonzero) exit value. C.2.6.3 Output Files Object files, listing files, and/or executable files shall be produced in unspecified formats. C.2.7 Extended Description C.2.7.1 Standard Libraries The fort77 utility shall recognize the following -l operand for the standard library: -l f This library contains all library functions referenced in ISO 1539 {2}. An implementation shall not require this operand to be present to cause a search of this library. In the absence of options that inhibit invocation of the link editor, such as -c, the fort77 utility shall cause the equivalent of a -l f operand to be passed to the link editor as the last -l operand, causing it to be searched after all other object files and libraries are loaded. It is unspecified whether the library libf.a exists as a regular file. The implementation may accept as -l operands names of objects that do not exist as regular files. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 968 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 C.2.7.2 External Symbols The FORTRAN compiler and link editor shall support the significance of 1 external symbols up to a length of at least 31 bytes. The compiler may 1 fold case (i.e., may ignore uppercase/lowercase distinctions between 1 identifiers). The action taken upon encountering symbols exceeding the 1 implementation-defined maximum symbol length is unspecified. The compiler and link editor shall support a minimum of 511 external symbols per source or object file, and a minimum of 4095 external symbols total. A diagnostic message is written to standard output if the implementation-defined limit is exceeded; other actions are unspecified. C.2.8 Exit Status The fort77 utility shall exit with one of the following values: 0 Successful compilation or link edit. >0 An error occurred. C.2.9 Consequences of Errors When fort77 encounters a compilation error, it shall write a diagnostic to standard error and continue to compile other source code operands. It shall return a nonzero exit status, but it is implementation defined whether an object module is created. If the link edit is unsuccessful, a diagnostic message shall be written to standard error, and fort77 shall exit with a nonzero status. BEGIN_RATIONALE C.2.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) _E_x_a_m_p_l_e_s_,__U_s_a_g_e The following are examples of usage: fort77 -o foo xyz.f Compiles xyz.f and creates the executable foo. fort77 -c xyz.f Compiles xyz.f and creates the object file xyz.o. fort77 xyz.f Compiles xyz.f and creates the executable a.out. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.2 fort77 - FORTRAN compiler 969 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX fort77 xyz.f b.o Compiles xyz.f, links it with b.o, and creates the executable a.out. _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e The file inclusion and symbol definition (#define) mechanisms used by the c89 utility were not included in POSIX.2--even though they are commonly implemented--since there is no requirement that the FORTRAN compiler use the C preprocessor. The -onetrip option was not included in this specification, even though many historical compilers support it, because it is a relic from FORTRAN-66; it is an anachronism that should not be perpetuated. Some implementations produce compilation listings. This aspect of FORTRAN has been left unspecified because there was opposition within the balloting group to the various methods proposed for implementing it: a -V option overlapped with historical vendor practice and a naming convention of creating files with .l suffixes collided with historical lex file naming practice. There is no -I option in this version of POSIX.2 to specify a directory for file inclusion. An INCLUDE directive has been a part of the FORTRAN-8X discussions, but it is not clear whether it will be retained. It is noted that many FORTRAN compilers produce an object module even when compilation errors occur; during a subsequent compilation, the compiler may patch the object module rather than recompiling all the code. Consequently, it is left to the implementor whether or not an object file is created. The name of this utility was changed to fort77 in Draft 9 to parallel the renaming of the C compiler. The name f77 was not chosen to avoid collision with historical implementations. A reference to MIL-STD-1753 was removed from an earlier draft in response to a request from the POSIX.9 working group. It was not the intention of this document to require certification of the FORTRAN compiler and the forthcoming POSIX.9 standard does not specify the military standard or any special preprocessing requirements. Furthermore, use of that document would have been inappropriate for an international standard. The specification of optimization has been subject to changes through early drafts. At one time, -O and -N were Booleans: optimize and do not optimize (with an unspecified default). Some historical practice lead this to be changed to: -O 0 No optimization. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 970 C FORTRAN Development and Runtime Utilities Options Part 2: SHELL AND UTILITIES P1003.2/D11.2 -O 1 Some level of optimization. -O _n Other, unspecified levels of optimization. It is not always clear whether ``good code generation'' is the same thing as optimization. Simple optimizations of local actions do not usually affect the semantics of a program. The -O 0 option has been included to accommodate the very fussy nature of scientific calculations in a highly optimized environment; compilers make errors. Some degree of optimization is expected, even if it is not documented here, and the ability to shut it off completely could be important when porting an application. An implementation may treat -O 0 as ``do less than normal'' if it wishes, but this is only meaningful if any of the operations it performs can affect the semantics of a program. It is highly dependent on the implementation whether doing less than normal makes sense. It is not the intent of this to ask for sloppy code generation, but rather to assure that any semantically visible optimization is suppressed. The specification of standard library access is consistent with the C compiler specification. Implementations are not required to have /usr/lib/libf.a, as many historical implementations do, but if not they are required to recognize 'f' as a token. External symbol size limits are in a normative subclause; portable applications need to know these limits. However, the minimum maximum symbol length should be taken as a constraint on a portable application, not on an implementation, and consequently the action taken for a symbol exceeding the limit is unspecified. The minimum size for the external symbol table was added for similar reasons. The Consequences of Errors subclause clearly specifies the compiler's behavior when compilation or link-edit error occur. The behavior of several historical implementations was examined, and the choice was made to be silent on the status of the executable, or a.out, file in the face of compiler or linker errors. If a linker writes the executable file, then links it on disk with _l_s_e_e_k()s and _w_r_i_t_e()s, the partially-linked executable can be left on disk and its execute bits turned off if the link edit fails. However, if the linker links the image in memory before writing the file to disk, it need not touch the executable file (if it already exists) because the link edit fails. Since both approaches are existing practice, a portable application shall rely on the exit status of fort77, rather than on the existence or mode of the executable file. The -g and -s options are not specified as mutually exclusive. Historically these two options have been mutually exclusive, but because both are so loosely specified, it seemed cleaner to leave their interaction unspecified. Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. C.2 fort77 - FORTRAN compiler 971 P1003.2/D11.2 The requirement that portable applications specify compiler options separately is to reserve the multicharacter option namespace for vendor- specific compiler options, which are known to exist in many historical implementations. Implementations are not required to recognize, for example, -gc as if it were -g -c; nor are they forbidden from doing so. The synopsis shows all of the options separately to highlight this requirement on applications. Echoing filenames to standard error is considered a diagnostic message, because it would otherwise difficult to associate an error message with the erring file. They are describing with ``may'' to allow implementations to use other methods of identifying files and to parallel the description in c89. END_RATIONALE Copyright (c) 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 972 C FORTRAN Development and Runtime Utilities Options P1003.2/D11.2 Annex D (informative) Bibliography BEGIN_RATIONALE BEGIN_RATIONALE {B1} ISO 639: 1988, _C_o_d_e _f_o_r _t_h_e _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _n_a_m_e_s _o_f _l_a_n_g_u_a_g_e_s.1) {B2} ISO 2022: 1986, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_I_S_O _7-_b_i_t _a_n_d _8-_b_i_t _c_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t_s--_C_o_d_e _e_x_t_e_n_s_i_o_n _t_e_c_h_n_i_q_u_e_s. {B3} ISO 2047: 1975, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_G_r_a_p_h_i_c_a_l _r_e_p_r_e_s_e_n_t_a_t_i_o_n_s _f_o_r _t_h_e _c_o_n_t_r_o_l _c_h_a_r_a_c_t_e_r_s _o_f _t_h_e _7-_b_i_t _c_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t. {B4} ISO 3166: 1988, _C_o_d_e _f_o_r _t_h_e _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _n_a_m_e_s _o_f _c_o_u_n_t_r_i_e_s. {B5} ISO 6429: 1988, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_C_o_n_t_r_o_l _f_u_n_c_t_i_o_n_s _f_o_r _7-_b_i_t _a_n_d _8-_b_i_t _c_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t_s. {B6} ISO 6937-2: 1983, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_C_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t_s _f_o_r _t_e_x_t _c_o_m_m_u_n_i_c_a_t_i_o_n--_P_a_r_t _2: _L_a_t_i_n _a_l_p_h_a_b_e_t_i_c _a_n_d _n_o_n-_a_l_p_h_a_b_e_t_i_c _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r_s. {B7} ISO 8802-3: 1989, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g _s_y_s_t_e_m_s--_L_o_c_a_l _a_r_e_a _n_e_t_w_o_r_k_s--_P_a_r_t _3: _C_a_r_r_i_e_r _s_e_n_s_e _m_u_l_t_i_p_l_e _a_c_c_e_s_s _w_i_t_h _c_o_l_l_i_s_i_o_n _d_e_t_e_c_t_i_o_n (_C_S_M_A/_C_D) _a_c_c_e_s_s _m_e_t_h_o_d _a_n_d _p_h_y_s_i_c_a_l _l_a_y_e_r _s_p_e_c_i_f_i_c_a_t_i_o_n. {B8} ISO 8806: 1988, _D_a_t_a _e_l_e_m_e_n_t_s _a_n_d _i_n_t_e_r_c_h_a_n_g_e _f_o_r_m_a_t_s--_I_n_f_o_r_m_a_t_i_o_n _i_n_t_e_r_c_h_a_n_g_e --_R_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _d_a_t_e_s _a_n_d _t_i_m_e_s. {B9} ISO 8859, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_8-_b_i_t _s_i_n_g_l_e-_b_y_t_e _c_o_d_e_d _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s. (Parts 1 to 8 published.) __________ 1) ISO documents can be obtained from the ISO office, 1, rue de Varembe', Case Postale 56, CH-1211, Gene`ve 20, Switzerland/Suisse. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex D Bibliography 973 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX {B10} ISO/IEC 10367: ...,2) _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_R_e_p_e_r_t_o_i_r_e _o_f _s_t_a_n_d_a_r_d_i_z_e_d _c_o_d_e_d _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s _f_o_r _u_s_e _i_n _8-_b_i_t _c_o_d_e_s. {B11} ISO/IEC 10646: ...,3) _I_n_f_o_r_m_a_t_i_o_n _t_e_c_h_n_o_l_o_g_y--_U_n_i_v_e_r_s_a_l _C_o_d_e_d _C_h_a_r_a_c_t_e_r _S_e_t (_U_C_S). {B12} International Organization for Standardization/Association Franc,aise de Normalisation. _D_i_c_t_i_o_n_a_r_y _o_f _C_o_m_p_u_t_e_r _S_c_i_e_n_c_e/_D_i_c_t_i_o_n_n_a_i_r_e _d_e _L'_I_n_f_o_r_m_a_t_i_q_u_e. Geneva/Paris: ISO/AFNOR, 1989. {B13} ANSI X3.43-1986,4) _R_e_p_r_e_s_e_n_t_a_t_i_o_n_s _f_o_r _L_o_c_a_l _T_i_m_e_s _o_f _t_h_e _D_a_y _f_o_r _I_n_f_o_r_m_a_t_i_o_n _I_n_t_e_r_c_h_a_n_g_e. {B14} GB 2312-1980, Chinese Association for Standardization. _C_o_d_e_d _C_h_i_n_e_s_e _G_r_a_p_h_i_c _C_h_a_r_a_c_t_e_r _S_e_t _f_o_r _I_n_f_o_r_m_a_t_i_o_n _I_n_t_e_r_c_h_a_n_g_e. {B15} JIS X0208-1990, Japanese National Committee on ISO/IEC JTC1/SC2. _J_a_p_a_n_e_s_e _G_r_a_p_h_i_c _C_h_a_r_a_c_t_e_r _S_e_t _f_o_r _I_n_f_o_r_m_a_t_i_o_n _I_n_t_e_r_c_h_a_n_g_e. {B16} JIS X0212-1990, Japanese National Committee on ISO/IEC JTC1/SC2. _S_u_p_p_l_e_m_e_n_t_a_r_y _J_a_p_a_n_e_s_e _G_r_a_p_h_i_c _C_h_a_r_a_c_t_e_r _S_e_t _f_o_r _I_n_f_o_r_m_a_t_i_o_n _I_n_t_e_r_c_h_a_n_g_e. {B17} KS C 5601-1987, Korean Bureau of Standards. _K_o_r_e_a_n _G_r_a_p_h_i_c _C_h_a_r_a_c_t_e_r _S_e_t _f_o_r _I_n_f_o_r_m_a_t_i_o_n _I_n_t_e_r_c_h_a_n_g_e. {B18} IEEE Std 100-1988, _I_E_E_E _S_t_a_n_d_a_r_d _D_i_c_t_i_o_n_a_r_y _o_f _E_l_e_c_t_r_i_c_a_l _a_n_d _E_l_e_c_t_r_o_n_i_c_s _T_e_r_m_s. {B19} IEEE P1003.3,5) _S_t_a_n_d_a_r_d _f_o_r _I_n_f_o_r_m_a_t_i_o_n _T_e_c_h_n_o_l_o_g_y--_T_e_s_t _M_e_t_h_o_d_s _f_o_r _M_e_a_s_u_r_i_n_g _C_o_n_f_o_r_m_a_n_c_e _t_o _P_O_S_I_X {B20} IEEE P1003.3.2,6) _S_t_a_n_d_a_r_d _f_o_r _I_n_f_o_r_m_a_t_i_o_n _T_e_c_h_n_o_l_o_g_y--_T_e_s_t _M_e_t_h_o_d_s _f_o_r _M_e_a_s_u_r_i_n_g _C_o_n_f_o_r_m_a_n_c_e _t_o _P_O_S_I_X._2 {B21} Aho, Alfred V., Kernighan, Brian W., Weinberger, Peter J., _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, Reading, MA: Addison-Wesley, 1988. __________ 2) To be approved and published. 3) To be approved and published. 4) ANSI documents can be obtained from the Sales Department, American National Standards Institute, 1430 Broadway, New York, NY 10018. 5) To be approved and published. 6) To be approved and published. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 974 D Bibliography Part 2: SHELL AND UTILITIES P1003.2/D11.2 {B22} Aho, Alfred V., Sethi, Ravi, Ullman, Jeffrey D., _C_o_m_p_i_l_e_r_s, _P_r_i_n_c_i_p_l_e_s, _T_e_c_h_n_i_q_u_e_s, _a_n_d _T_o_o_l_s, Reading, MA: Addison-Wesley, 1986. {B23} Aho, Alfred V., Ullman, Jeffrey D., _P_r_i_n_c_i_p_l_e_s _o_f _C_o_m_p_i_l_e_r _D_e_s_i_g_n, Reading, MA: Addison-Wesley, 1977. {B24} American Telephone and Telegraph Company. _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n (_S_V_I_D), _I_s_s_u_e_s _2 _a_n_d _3. Morristown, NJ: UNIX Press, 1986, 1989.7) {B25} Bolsky, Morris I., Korn, David G., _T_h_e _K_o_r_n_S_h_e_l_l _C_o_m_m_a_n_d _a_n_d _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, Englewood Cliffs, NJ: Prentice Hall, 1988. {B26} DeRemer, Frank, and Thomas J. Pennello, ``Efficient Computation of LALR(1) Look-ahead Sets.'' _S_i_g_P_l_a_n _N_o_t_i_c_e_s 15:8, 176-187, August, 1979. {B27} Knuth, D. E. ``On the translation of languages from left to right.'' _I_n_f_o_r_m_a_t_i_o_n _a_n_d _C_o_n_t_r_o_l 8:6, 607-639. {B28} University of California at Berkeley--Computer Science Research Group. _4._3 _B_e_r_k_e_l_e_y _S_o_f_t_w_a_r_e _D_i_s_t_r_i_b_u_t_i_o_n, _V_i_r_t_u_a_l _V_A_X-_1_1 _V_e_r_s_i_o_n. Berkeley, CA: The Regents of the University of California, April 1986. {B29} /usr/group Standards Committee. _1_9_8_4 /_u_s_r/_g_r_o_u_p _S_t_a_n_d_a_r_d. Santa Clara, CA: UniForum, 1984. {B30} X/Open Company, Ltd. _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e, _I_s_s_u_e _2. Amsterdam: Elsevier Science Publishers, 1987. {B31} X/Open Company, Ltd. _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e, _I_s_s_u_e _3. Englewood Cliffs, NJ: Prentice-Hall, 1989. END_RATIONALE END_RATIONALE __________ 7) This is one of several documents that represent an industry specification in an area related to POSIX.2. The creators of such documents may be able to identify newer versions that may be interesting. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex D Bibliography 975 P1003.2/D11.2 Annex E (informative) Rationale and Notes BEGIN_RATIONALE This annex summarizes the deliberations of the IEEE P1003.2 Working Group, the committee charged by the IEEE Computer Society's Technical Committee on Operating Systems and Operational Environments with devising an interface standard for a shell and related utilities to support and extend POSIX.1. The annex is being published along with the standard to assist in the process of review. It contains historical information concerning the contents of the standard and why features were included or discarded by the Working Group. It also contains notes of interest to application programmers on recommended programming practices, emphasizing the consequences of some aspects of the standard that may not be immediately apparent. Just as this standard relies on the knowledge of architecture, history, and definitions from the POSIX.1, so does this annex. The reader is referred to the Rationale and Notes appendix of POSIX.1 for background material and bibliographic information about UNIX systems in general and POSIX specifically, which will not be duplicated here. BEGIN_RATIONALE E.1 General _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _1, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.1 General 977 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX E.1.1 Scope E.1.2 Normative References E.1.3 Conformance BEGIN_RATIONALE E.2 Terminology and General Requirements _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _2, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. E.2.1 Conventions E.2.2 Definitions E.2.3 Built-in Utilities E.2.4 Character Set E.2.5 Locale E.2.6 Environment Variables E.2.7 Required Files E.2.8 Regular Expression Notation E.2.9 Dependencies on Other Standards Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 978 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 E.2.10 Utility Conventions E.2.11 Utility Description Defaults E.2.12 File Format Notation E.2.13 Configuration Values BEGIN_RATIONALE E.3 Shell Command Language _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _3, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. E.3.1 Shell Definitions E.3.2 Quoting E.3.3 Token Recognition E.3.4 Reserved Words E.3.5 Parameters and Variables E.3.6 Word Expansions E.3.7 Redirection E.3.8 Exit Status for Commands Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.3 Shell Command Language 979 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX E.3.9 Shell Commands E.3.10 Shell Grammar E.3.11 Signals and Error Handling E.3.12 Shell Execution Environment E.3.13 Pattern Matching Notation E.3.14 Special Built-in Utilities BEGIN_RATIONALE E.4 Execution Environment Utilities _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _4, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. _N_o_t_a_t_i_o_n_s _r_e_g_a_r_d_i_n_g _u_t_i_l_i_t_i_e_s _p_r_o_b_a_b_l_y _i_n_c_l_u_d_e_d _i_n _t_h_e _U_P_E _h_a_v_e _b_e_e_n _u_p_d_a_t_e_d, _w_i_t_h_o_u_t _d_i_f_f _m_a_r_k_s, _b_a_s_e_d _o_n _t_h_e _c_u_r_r_e_n_t _w_o_r_k_i_n_g _d_r_a_f_t _o_f _1_0_0_3._2_a. Many utilities were evaluated by the working group; more utilities were excluded from the standard than included. The following list contains many common UNIX system utilities that were not included as Execution Environment Utilities or in one of the Software Development Environment groups. It is logistically difficult for this Rationale to correctly distribute the reasons for not including a utility among the various utility environment sections. Therefore, this section covers the reasons for all utilities not included in Sections 4 and 6 and Annexes A and C. The working group started its deliberations with a recommended list of utilities provided by the X/Open group of companies. This list was a subset of the utilities in the _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e, _I_s_s_u_e _I_I, so it was very closely related to System V. The list had already been purged of purely administrative utilities, such as those found in System V's Administered System Extension. Then, the working group applied its scope as a filter and substantially pruned the remaining list as well. The following list of ``rejected'' utilities is limited by its historical roots; since the selected utilities emerged from primarily a System V base, this list does not include sometimes familiar entries from BSD. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 980 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 The working group received substantial input from representatives of the University of California at Berkeley and from companies that are firmly allied with BSD versions of the UNIX system, enough so that some BSD- derived utilities are included in the standard. However, this Rationale is now limited to a discussion of only those utilities actively or indirectly evaluated by the working group, rather than the list of all known UNIX utilities from all its variants. This list will most likely be augmented during the balloting process as balloters request specific rationales for their favorite commands. In the list, the notation [_P_O_S_I_X._2_a] is used to identify utilities that are being evaluated for inclusion in the forthcoming User Portability Extension to this standard. Similarly, [_P_O_S_I_X._7] is used for those that may be appropriate for the working group evaluating system administration and [_P_O_S_I_X._N_e_t] for networking standards. adb The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. Furthermore, many useful aspects of adb are very hardware-specific. admin The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. as Assemblers are hardware-specific and are included implicitly as part of the compilers in the standard. at The at and cron family of utilities were omitted because portable applications could not rely on their behavior. [_P_O_S_I_X._2_a] banner The only known use of this command is as part of the LP printer header pages. It was decided that the format of the header is implementation defined, so this utility is superfluous to application portability. batch The at and cron family of utilities were omitted because portable applications could not rely on their behavior. [_P_O_S_I_X._2_a] cal This calendar printing program is not useful to portable applications. calendar This reminder service program is not useful to portable applications. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 981 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX cancel The LP (line printer spooling) system specified is the most basic possible and did not need this level of application control. [_P_O_S_I_X._7] cflow The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. chroot This is primarily of administrative use, requiring super- user privileges. [_P_O_S_I_X._7] col No utilities defined in this standard produce output requiring such a filter. The nroff text formatter is present on many historical systems and will continue to remain as an extension; col is expected to be shipped by all the systems that ship nroff. cpio This has been replaced by pax, for reasons explained in its own Rationale. cpp Can be subsumed by c89. crontab The at and cron family of utilities were omitted because portable applications could not rely on their behavior. [_P_O_S_I_X._2_a] csplit This utility's functionality can sometimes be provided by the dd or sed utilities (i.e., although these utilities cannot easily provide all of csplit'_s features in one package, they can frequently be used for the type of task that csplit is being used for). [_P_O_S_I_X._2_a] cu Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._N_e_t] cxref The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. dc This utility's functionality can be provided by the bc utility; bc was selected because it was easier to use and had superior functionality. Although the historical versions of bc are implemented using dc as a base, this standard prescribes the interface and not the underlying mechanism used to implement it. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 982 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 delta The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. df As the standard does not address the concept or nature of file systems, this command could not be specified in a manner useful to portable applications. [_P_O_S_I_X._2_a] dircmp Although a useful concept, the traditional output of this directory comparison program is not suitable for processing in applications programs. Also, the diff -r command gives equivalent functionality. dis Disassemblers are hardware-specific. du Because of differences between systems in measuring disk usage, this utility could not be used reliably by a portable application. [_P_O_S_I_X._2_a] egrep Marked obsolescent and replaced by the new version of grep. ex This is typically a link to the vi terminal-oriented editor-not useful from shell scripts or typical application programs. The nonterminal oriented facilities of ex are provided by ed. [_P_O_S_I_X._2_a] fgrep Marked obsolescent and replaced by the new version of grep. file Determining the type of file is generally accomplished with test or find. The added information available with file is of little use to a portable application, particularly since there is considerable variation in its output contents. [_P_O_S_I_X._2_a] get The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. ld Is subsumed by c89. line The functionality of line can be provided with read. lint The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 983 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX is primarily a debugging tool. login Terminal oriented-not useful from shell scripts or typical application programs. lorder This utility is an aid in creating an implementation- specific detail of object libraries that the working group did not feel required standardization. lpstat The LP system specified is the most basic possible and did not need this level of application control. [_P_O_S_I_X._7] m4 The working group did not find that this macro processor had sufficiently wide usage for standardization. mail This utility was omitted in favor of mailx, because there was a considerable functionality overlap between the two. The mail-sending aspects of mailx are covered in this standard, the mail-reading in the UPE. [_P_O_S_I_X._2_a] mesg Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._2_a] mknod This was omitted in favor of mkfifo, as mknod has too many implementation-defined functions. [_P_O_S_I_X._7] newgrp Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._2_a] news Terminal oriented-not useful from shell scripts or typical application programs. nice Due to historical variations in usage, and in the lack of underlying support from possible POSIX.1 {8} base systems, this cannot be used by applications to achieve reliable results. [_P_O_S_I_X._2_a] nl The useful functionality of nl can be provided with pr. nm The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. [_P_O_S_I_X._2_a] pack The working group found little interest in a portable data compression program (and there are others that are probably more widely used anyway). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 984 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 passwd Terminal oriented-not useful from shell scripts or typical application programs. (There was also sentiment to avoid security-related utilities until requirements of 1003.6 are known.) pcat The working group found little interest in a portable data compression program (and there are others that are probably more widely used anyway). pg Terminal oriented-not useful from shell scripts or typical application programs. prof The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. prs The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. ps This utility has historically been difficult to specify portably due to the many implementation-defined aspects of processes. Furthermore, a portable application can rarely rely on information about what other processes are doing, as security mechanisms may prevent it. A process requiring one of its children's process IDs (such as for use with the kill command) will have to record the IDs at the time of creation. [_P_O_S_I_X._2_a] red Restricted editor. This was not considered by the working group because it never provided the level of security restriction required. rmdel The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. rsh Restricted shell. This was not considered by the working group because it does not provide the level of security 1 restriction that is implied by historical documentation. 1 sact The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 985 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX sdb The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. Furthermore, some useful aspects of sdb are very hardware-specific. sdiff The ``side-by-side diff'' utility from System V was omitted because it is used infrequently, and even less so by portable applications. Despite being in System V, it is not in the _S_V_I_D or _X_P_G. shar Utilities with this type of functionality (``shell-based archivers'') are in wide use, despite not being included in System V or BSD systems. However, the working group felt this sort of program was more widely used by human users than portable applications. shl Terminal oriented-not useful from shell scripts or typical application programs. The job control aspects of the Shell Command Language are generally more useful and are being evaluated for the UPE. size The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. spell Not useful from shell scripts or typical application programs. split The functionality can sometimes be provided by the dd, sed, or (for some uses) xargs utilities (i.e., although these utilities cannot easily provide all of split'_s features in one package, they can sometimes be used for the type of task that split is being used for). [_P_O_S_I_X._2_a] strings This is normally used by human users during debugging, rather than by applications. [_P_O_S_I_X._2_a] su Not useful from shell scripts or typical application programs. (There was also sentiment to avoid security- related utilities until requirements of POSIX.6 are known.) sum This utility was renamed cksum. tabs Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._2_a] Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 986 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 time Not necessary for portable applications. It is frequently used by human users in debugging or for informal benchmarks. It is doubtful whether any standardized definitions of the output could be agreed upon. tsort This utility is an aid in creating an implementation- specific detail of object libraries that the working group did not feel required standardization. unget The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. unpack The working group found little interest in a portable data compression program (and there are others that are probably more widely used anyway). uucp uulog uupick uustat uuto The UUCP utilities and their protocol description were 1 removed from an early draft because responsibility for 1 them was officially requested by the POSIX group 1 developing networking interfaces. 1 val The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. vi Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._2_a] wall Terminal oriented-not useful from shell scripts or typical application programs. It is generally used by system administrators, as well. [_P_O_S_I_X._7] what The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This SCCS utility is primarily a development tool. who The ability to determine other users on the system was felt to be at risk in a trusted implementation, so its use could not be considered by a portable application. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 987 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX [_P_O_S_I_X._2_a] write Terminal oriented-not useful from shell scripts or typical application programs. [_P_O_S_I_X._2_a] _E._4._1 awk - _P_a_t_t_e_r_n _s_c_a_n_n_i_n_g _a_n_d _p_r_o_c_e_s_s_i_n_g _l_a_n_g_u_a_g_e _E._4._2 basename - _R_e_t_u_r_n _n_o_n_d_i_r_e_c_t_o_r_y _p_o_r_t_i_o_n _o_f _p_a_t_h_n_a_m_e _E._4._3 bc - _A_r_b_i_t_r_a_r_y-_p_r_e_c_i_s_i_o_n _a_r_i_t_h_m_e_t_i_c _l_a_n_g_u_a_g_e _E._4._4 cat - _C_o_n_c_a_t_e_n_a_t_e _a_n_d _p_r_i_n_t _f_i_l_e_s _E._4._5 cd - _C_h_a_n_g_e _w_o_r_k_i_n_g _d_i_r_e_c_t_o_r_y _E._4._6 chgrp - _C_h_a_n_g_e _f_i_l_e _g_r_o_u_p _o_w_n_e_r_s_h_i_p _E._4._7 chmod - _C_h_a_n_g_e _f_i_l_e _m_o_d_e_s _E._4._8 chown - _C_h_a_n_g_e _f_i_l_e _o_w_n_e_r_s_h_i_p _E._4._9 cksum - _W_r_i_t_e _f_i_l_e _c_h_e_c_k_s_u_m_s _a_n_d _b_l_o_c_k _c_o_u_n_t_s _E._4._1_0 cmp - _C_o_m_p_a_r_e _t_w_o _f_i_l_e_s _E._4._1_1 comm - _S_e_l_e_c_t _o_r _r_e_j_e_c_t _l_i_n_e_s _c_o_m_m_o_n _t_o _t_w_o _f_i_l_e_s _E._4._1_2 command - _S_e_l_e_c_t _o_r _r_e_j_e_c_t _l_i_n_e_s _c_o_m_m_o_n _t_o _t_w_o _f_i_l_e_s _E._4._1_3 cp - _C_o_p_y _f_i_l_e_s Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 988 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E._4._1_4 cut - _C_u_t _o_u_t _s_e_l_e_c_t_e_d _f_i_e_l_d_s _o_f _e_a_c_h _l_i_n_e _o_f _a _f_i_l_e _E._4._1_5 date - _W_r_i_t_e _t_h_e _d_a_t_e _a_n_d _t_i_m_e _E._4._1_6 dd - _C_o_n_v_e_r_t _a_n_d _c_o_p_y _a _f_i_l_e _E._4._1_7 diff - _C_o_m_p_a_r_e _t_w_o _f_i_l_e_s _E._4._1_8 dirname - _R_e_t_u_r_n _d_i_r_e_c_t_o_r_y _p_o_r_t_i_o_n _o_f _p_a_t_h_n_a_m_e _E._4._1_9 echo - _W_r_i_t_e _a_r_g_u_m_e_n_t_s _t_o _s_t_a_n_d_a_r_d _o_u_t_p_u_t _E._4._2_0 ed - _E_d_i_t _t_e_x_t _E._4._2_1 env - _S_e_t _e_n_v_i_r_o_n_m_e_n_t _f_o_r _c_o_m_m_a_n_d _i_n_v_o_c_a_t_i_o_n _E._4._2_2 expr - _E_v_a_l_u_a_t_e _a_r_g_u_m_e_n_t_s _a_s _a_n _e_x_p_r_e_s_s_i_o_n _E._4._2_3 false - _R_e_t_u_r_n _f_a_l_s_e _v_a_l_u_e _E._4._2_4 find - _F_i_n_d _f_i_l_e_s _E._4._2_5 fold - _F_i_l_t_e_r _f_o_r _f_o_l_d_i_n_g _l_i_n_e_s _E._4._2_6 getconf - _G_e_t _c_o_n_f_i_g_u_r_a_t_i_o_n _v_a_l_u_e_s _E._4._2_7 getopts - _P_a_r_s_e _u_t_i_l_i_t_y _o_p_t_i_o_n_s _E._4._2_8 grep - _F_i_l_e _p_a_t_t_e_r_n _s_e_a_r_c_h_e_r Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 989 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E._4._2_9 head - _C_o_p_y _t_h_e _f_i_r_s_t _p_a_r_t _o_f _f_i_l_e_s _E._4._3_0 id - _R_e_t_u_r_n _u_s_e_r _i_d_e_n_t_i_t_y _E._4._3_1 join - _R_e_l_a_t_i_o_n_a_l _d_a_t_a_b_a_s_e _o_p_e_r_a_t_o_r _E._4._3_2 kill - _T_e_r_m_i_n_a_t_e _o_r _s_i_g_n_a_l _p_r_o_c_e_s_s_e_s _E._4._3_3 ln - _L_i_n_k _f_i_l_e_s _E._4._3_4 locale - _G_e_t _l_o_c_a_l_e-_s_p_e_c_i_f_i_c _i_n_f_o_r_m_a_t_i_o_n _E._4._3_5 localedef - _D_e_f_i_n_e _l_o_c_a_l_e _e_n_v_i_r_o_n_m_e_n_t _E._4._3_6 logger - _L_o_g _m_e_s_s_a_g_e_s _E._4._3_7 logname - _R_e_t_u_r_n _u_s_e_r'_s _l_o_g_i_n _n_a_m_e _E._4._3_8 lp - _S_e_n_d _f_i_l_e_s _t_o _a _p_r_i_n_t_e_r _E._4._3_9 ls - _L_i_s_t _d_i_r_e_c_t_o_r_y _c_o_n_t_e_n_t_s _E._4._4_0 mailx - _P_r_o_c_e_s_s _m_e_s_s_a_g_e_s _E._4._4_1 mkdir - _M_a_k_e _d_i_r_e_c_t_o_r_i_e_s _E._4._4_2 mkfifo - _M_a_k_e _F_I_F_O _s_p_e_c_i_a_l _f_i_l_e_s _E._4._4_3 mv - _M_o_v_e _f_i_l_e_s Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 990 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 _E._4._4_4 nohup - _I_n_v_o_k_e _a _u_t_i_l_i_t_y _i_m_m_u_n_e _t_o _h_a_n_g_u_p_s _E._4._4_5 od - _D_u_m_p _f_i_l_e_s _i_n _v_a_r_i_o_u_s _f_o_r_m_a_t_s _E._4._4_6 paste - _M_e_r_g_e _c_o_r_r_e_s_p_o_n_d_i_n_g _o_r _s_u_b_s_e_q_u_e_n_t _l_i_n_e_s _o_f _f_i_l_e_s _E._4._4_7 pathchk - _C_h_e_c_k _p_a_t_h_n_a_m_e_s _E._4._4_8 pax - _P_o_r_t_a_b_l_e _a_r_c_h_i_v_e _i_n_t_e_r_c_h_a_n_g_e _E._4._4_9 pr - _P_r_i_n_t _f_i_l_e_s _E._4._5_0 printf - _W_r_i_t_e _f_o_r_m_a_t_t_e_d _o_u_t_p_u_t _E._4._5_1 pwd - _R_e_t_u_r_n _w_o_r_k_i_n_g _d_i_r_e_c_t_o_r_y _n_a_m_e _E._4._5_2 read - _R_e_a_d _a _l_i_n_e _f_r_o_m _s_t_a_n_d_a_r_d _i_n_p_u_t _E._4._5_3 rm - _R_e_m_o_v_e _d_i_r_e_c_t_o_r_y _e_n_t_r_i_e_s _E._4._5_4 rmdir - _R_e_m_o_v_e _d_i_r_e_c_t_o_r_i_e_s _E._4._5_5 sed - _S_t_r_e_a_m _e_d_i_t_o_r _E._4._5_6 sh - _S_h_e_l_l, _t_h_e _s_t_a_n_d_a_r_d _c_o_m_m_a_n_d _l_a_n_g_u_a_g_e _i_n_t_e_r_p_r_e_t_e_r _E._4._5_7 sleep - _S_u_s_p_e_n_d _e_x_e_c_u_t_i_o_n _f_o_r _a_n _i_n_t_e_r_v_a_l _E._4._5_8 sort - _S_o_r_t, _m_e_r_g_e, _o_r _s_e_q_u_e_n_c_e _c_h_e_c_k _t_e_x_t _f_i_l_e_s Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.4 Execution Environment Utilities 991 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E._4._5_9 stty - _S_e_t _t_h_e _o_p_t_i_o_n_s _f_o_r _a _t_e_r_m_i_n_a_l _E._4._6_0 tail - _C_o_p_y _t_h_e _l_a_s_t _p_a_r_t _o_f _a _f_i_l_e _E._4._6_1 tee - _D_u_p_l_i_c_a_t_e _s_t_a_n_d_a_r_d _i_n_p_u_t _E._4._6_2 test - _E_v_a_l_u_a_t_e _e_x_p_r_e_s_s_i_o_n _E._4._6_3 touch - _C_h_a_n_g_e _f_i_l_e _a_c_c_e_s_s _a_n_d _m_o_d_i_f_i_c_a_t_i_o_n _t_i_m_e_s _E._4._6_4 tr - _T_r_a_n_s_l_a_t_e _c_h_a_r_a_c_t_e_r_s _E._4._6_5 true - _R_e_t_u_r_n _t_r_u_e _v_a_l_u_e _E._4._6_6 tty - _R_e_t_u_r_n _u_s_e_r'_s _t_e_r_m_i_n_a_l _n_a_m_e _E._4._6_7 umask - _G_e_t _o_r _s_e_t _t_h_e _f_i_l_e _m_o_d_e _c_r_e_a_t_i_o_n _m_a_s_k _E._4._6_8 uname - _R_e_t_u_r_n _s_y_s_t_e_m _n_a_m_e _E._4._6_9 uniq - _R_e_p_o_r_t _o_r _f_i_l_t_e_r _o_u_t _r_e_p_e_a_t_e_d _l_i_n_e_s _i_n _a _f_i_l_e _E._4._7_0 wait - _A_w_a_i_t _p_r_o_c_e_s_s _c_o_m_p_l_e_t_i_o_n _E._4._7_1 wc - _W_o_r_d, _l_i_n_e, _a_n_d _b_y_t_e _c_o_u_n_t _E._4._7_2 xargs - _C_o_n_s_t_r_u_c_t _a_r_g_u_m_e_n_t _l_i_s_t(_s) _a_n_d _i_n_v_o_k_e _u_t_i_l_i_t_y BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 992 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 E.5 User Portability Utilities Option _E_d_i_t_o_r'_s _N_o_t_e: _T_h_i_s _s_e_c_t_i_o_n _i_s _u_n_u_s_e_d _i_n _t_h_i_s _r_e_v_i_s_i_o_n _o_f _t_h_e _s_t_a_n_d_a_r_d. BEGIN_RATIONALE E.6 Software Development Utilities Option _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _6, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. This is the first of the optional utility environments. The working group decided there were two basic classes of systems to be supported: general application execution and software development. The first is widely used and is the primary reason for the development of this standard. The second, however, represents only a (small?) subset of the first; the users are generally only those who are developing or installing C or FORTRAN applications. Therefore, all the development environments are optional, giving users the option of specifying a smaller, (presumably) less expensive system. There are three separate optional environments, so that C-only or FORTRAN-only users do not have to specify unneeded components. As further languages are supported by this standard, their environments will also be optional. An implementation must provide all three of these utilities to claim conformance to this section. See section E.4 for a discussion of utilities excluded from this group. E.6.1 ar - Create and maintain library archives E.6.2 make - Maintain, update, and regenerate groups of programs E.6.3 strip - Remove unnecessary information from executable files BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.6 Software Development Utilities Option 993 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX E.7 Language-Independent System Services _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _S_e_c_t_i_o_n _7, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. E.7.1 Shell Command Interface E.7.2 Access Environment Variables E.7.3 Regular Expression Matching E.7.4 Pattern Matching E.7.5 Command Option Parsing E.7.6 Generate Pathnames Matching a Pattern E.7.7 Perform Word Expansions E.7.8 Get POSIX Configurable Variables E.7.9 Locale Control BEGIN_RATIONALE E.8 C Language Development Utilities Option _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _A_n_n_e_x _A, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. This is the second of the optional utility environments. An implementation must provide all three of these utilities to claim conformance to this section. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 994 E Rationale and Notes Part 2: SHELL AND UTILITIES P1003.2/D11.2 See section E.4 for a discussion of utilities excluded from this group. E.8.1 c89 - Compile Standard C programs E.8.2 lex - Generate programs for lexical tasks E.8.3 yacc - Yet another compiler compiler BEGIN_RATIONALE E.9 C Language Bindings Option _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _A_n_n_e_x _B, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. E.9.1 C Language Definitions E.9.2 C Numerical Limits E.9.3 C Binding for Shell Command Interface E.9.4 C Binding for Access Environment Variables E.9.5 C Binding for Regular Expression Matching E.9.6 C Binding for Match Filename or Pathname E.9.7 C Binding for Command Option Parsing E.9.8 C Binding for Generate Pathnames Matching a Pattern Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. E.9 C Language Bindings Option 995 P1003.2/D11.2 E.9.9 C Binding for Perform Word Expansions E.9.10 C Binding for Get POSIX Configurable Variables E.9.11 C Binding for Locale Control BEGIN_RATIONALE E.10 FORTRAN Development and Runtime Utilities Options _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _t_e_x_t _o_f _t_h_e _R_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _s_e_c_t_i_o_n _h_a_s _b_e_e_n _t_e_m_p_o_r_a_r_i_l_y _l_o_c_a_t_e_d _i_n _A_n_n_e_x _C, _a_d_j_a_c_e_n_t _t_o _t_h_e _t_e_x_t _i_t _i_s _e_x_p_l_a_i_n_i_n_g. _T_h_e _t_e_x_t _w_i_l_l _r_e_t_u_r_n _t_o _t_h_i_s _a_n_n_e_x _a_f_t_e_r _t_h_e _c_o_m_p_l_e_t_i_o_n _o_f _b_a_l_l_o_t_i_n_g. This is the third and fourth of the optional utility environments. See section E.4 for a discussion of utilities excluded from this group. E.10.1 asa - Interpret carriage control characters E.10.2 fort77 - FORTRAN compiler END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 996 E Rationale and Notes P1003.2/D11.2 Annex F (informative) Sample National Profile BEGIN_RATIONALE BEGIN_RATIONALE _E_d_i_t_o_r'_s _N_o_t_e: _A_l_l _u_s_e_s _o_f _t_h_e _t_e_r_m ``_c_h_a_r_a_c_t_e_r _s_e_t'' _t_h_i_s _a_n_n_e_x _h_a_v_e _1 _b_e_e_n _c_h_a_n_g_e_d _t_o ``_c_o_d_e_d _c_h_a_r_a_c_t_e_r _s_e_t'' _w_i_t_h_o_u_t _f_u_r_t_h_e_r _d_i_f_f _m_a_r_k_s. _1 This annex is an example of a country's needs with respect to this standard and how those needs relate to other international standards as well as national standards. The example provided is included here for informative purposes and is not a formal standard in the country in question. It is provided by the Danish Standards Association1) and is as accurate as possible with regards to Danish needs. __________ 1) Further information may be obtained from the Danish Standards Association, Attn: S142u22A8 Baunegaardsvej 73, DK-2900 Hellerup, 2 Denmark; FAX: +45 39 77 02 02; Email: u22a8@dkuug.dk 2 The data is also available electronically by anonymous FTP or FTAM at the site dkuug.dk in the directory i18n, where some other example national profiles, locales, and _c_h_a_r_m_a_p_s may also be found. They are also available by an archive server reached at archive@dkuug.dk; use ``Subject: help'' for further information. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. Annex F Sample National Profile 997 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX F.1 (Example) Danish National Profile 2 This is the definition of the Danish Standards Association POSIX.2 2 profile. The subset of conforming implementations that provide the 2 required characteristics below is referred to as conforming to the 2 ``Danish Standards Association (DS) Environment Profile'' for this 2 standard. 2 This profile specifies the following requirements on implementations: 2 (1) In POSIX.2 section 2.13.1, the limit {COLL_WEIGHTS_MAX} shall be 2 provided with a value of 4. All other limits shall conform to 2 at least the minimum values shown in Table 2-16. 2 (2) The following options shall be supported according to POSIX.2 2 section 2.13.2: 2 POSIX2_C_BIND Optional. 2 POSIX2_C_DEV Optional. 2 POSIX2_FORT_DEV Optional. 2 POSIX2_FORT_RUN Optional. 2 POSIX2_LOCALEDEF Required; the system shall support the 2 creation of locales as described in 2 4.35. 2 POSIX2_SW_DEV Optional. 2 F.1.1 Danish Locale Model _E_d_i_t_o_r'_s _N_o_t_e: _T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _o_f_f_e_r_e_d _a_s _r_a_t_i_o_n_a_l_e _f_o_r _t_h_e _c_u_r_r_e_n_t _s_t_a_t_e _o_f _t_h_i_s _e_x_a_m_p_l_e _a_n_n_e_x. _I_t _w_i_l_l _n_o_t _n_e_c_e_s_s_a_r_i_l_y _a_p_p_e_a_r _i_n _t_h_i_s _f_o_r_m _i_n _a_n_y _f_i_n_a_l _v_e_r_s_i_o_n _o_f _t_h_e _a_n_n_e_x. Creating a national locale for Denmark has been a quite elaborate effort. Time and again, we thought we had reached an agreement on the locale, but then some aspect disrupted the entire work, and we more or less had to start all over. We think we have identified the cause of these problems to a general uncertainty regarding the exact purpose of a ``national'' locale. If we look at the Danish situation (which we know pretty well by now), we have identified several levels of locales, depending on the ``complexity'' of the collating sequence (or more generally sorting different kinds of text): Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 998 F Sample National Profile Part 2: SHELL AND UTILITIES P1003.2/D11.2 (1) _B_y_t_e/_m_a_c_h_i_n_e _l_e_v_e_l. Here everything is sorted according to the character's byte value. (2) _C_h_a_r_a_c_t_e_r/_u_t_i_l_i_t_y _l_e_v_e_l. Here we want to work almost on the same level as (1), i.e., character by character, but obeying a (simple) collating sequence that ensures that, for example, upper- and lowercase letters are equivalent, or that national characters are sorted correctly. The characters still do not have any ``implicit'' meaning, and the comparison of two strings is still deterministic; i.e., strings that are different at level 1 are still different at level 2. (3) _T_e_x_t/_a_p_p_l_i_c_a_t_i_o_n _l_e_v_e_l. Here we want to be able to search in text looking for specific words or items. The comparison is still performed on a character-by-character basis, but possibly ignoring some characters that are not important, and determinism is not important either. (4) _S_e_m_a_n_t_i_c/_d_i_c_t_i_o_n_a_r_y/_l_i_b_r_a_r_y/_p_h_o_n_e-_b_o_o_k _l_e_v_e_l. Entire words like ``the'' are omitted from comparisons; maybe soundex is required. This probably requires specially developed software. Our problem has been the conflicting requirements from each of these levels, which we optimistically have tried to combine into a single national locale (ignoring level 4, however). The POSIX Locale is aimed at level 2; i.e., at a rather low level. Many of our attempts to write a national Danish locale have failed because we have actually tried to write a level 3 locale, and finding that it did not work as an alternative to the default POSIX locale at level 2. The locale we now provide is the final compromise between level 2 and level 3, by taking our latest attempt aimed at level 3, and make the comparison completely deterministic, and thus bring it down to level 2. We also have found that we may need to include some more information in the identification of a specific locale than just the country code, the language code, and the coded character set, since what we have had most problems with was the purpose or scope of a specific locale; i.e., is it just a nationalized version of the POSIX Locale (e.g., extended with , , and at the proper positions), is it aimed at text search (ignoring certain characters), or is it on an even higher level? Many such alternative locales would certainly be useful for various classes of problems or applications, so our model for the locale name identification string includes a <_v_e_r_s_i_o_n> parameter. We hope by providing these comments to have clarified our intention with the locale definitions to save other countries from doing our mistakes all over. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. F.1 (Example) Danish National Profile 999 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX F.2 Locale String Definition Guideline The following guideline is used for specifying the locale identification string:2) "%2.2s_%2.2s.%s,%s", <_l_a_n_g_u_a_g_e>, <_t_e_r_r_i_t_o_r_y>, <_c_o_d_e_d-_c_h_a_r_a_c_t_e_r- _s_e_t>, <_v_e_r_s_i_o_n> where <_l_a_n_g_u_a_g_e> shall be taken from ISO 639 {B1} and <_t_e_r_r_i_t_o_r_y> shall be the two-letter country code of ISO 3166 {B4}, if possible. The <_l_a_n_g_u_a_g_e> shall be specified with lowercase letters only, and the <_t_e_r_r_i_t_o_r_y> shall be specified in uppercase letters only. An optional <_c_o_d_e_d-_c_h_a_r_a_c_t_e_r-_s_e_t> specification may follow after a for the name of the coded character set; if just a numeric specification is present, this shall represent the number of the international standard describing the coded character set. If the <_c_o_d_e_d-_c_h_a_r_a_c_t_e_r-_s_e_t> specification is not present, the encoded character-set-specific locale shall be determined by the CHARSET environment variable, and if this is unset or null, the encoding of ISO 8859-1 {5} shall be assumed. A parameter specifying a <_v_e_r_s_i_o_n> of the locale may be placed after the optional <_c_o_d_e_d-_c_h_a_r_a_c_t_e_r-_s_e_t> specification, delimited by . This may be used to discriminate between different cultural needs; for instance, dictionary order versus a more systems-oriented collating order. F.3 Scope of Danish National Locale This national locale covers the Danish language in Denmark. In addition, Faroese and Greenlandic LC_TIME and LC_MESSAGES specifications have been defined; the rest of the Danish national locale shall be used for these locales as well. This locale is designed to be coded character-set independent. It completely specifies the behavior of systems based on ISO/IEC 10646 {B11} (with ISO 6429 {B5} control character encoding) together with many 7-bit and 8-bit encoded character sets, including ISO 8859 character sets and major vendor-specific 8-bit character sets (with ISO 6429 {B5} or ISO/IEC 646 {1} control character encoding when applicable). This locale is portable as long as the character naming in the charmap description file ISO_10646 for ISO/IEC 10646 {B11} is followed. Examples of such charmap files for ISO/IEC 10646 {B11} and ISO 8859-1 {5} are shown in F.5.1 and F.5.2. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1000 F Sample National Profile Part 2: SHELL AND UTILITIES P1003.2/D11.2 The collating sequence is completely deterministic and is aimed for usage in system tools. Other Danish collation sequences with nondeterministic properties, which may be needed for some application programs, are not covered by this locale. The LC_TYPE category of the locale is quite general and may be useful for other locales; also the LC_COLLATE category, though specifically Danish, may be a good template from which to generate other locales. Following the preceding guidelines for locale names, the national Danish locale string shall be: da_DK F.3.1 da_DK - (Example) Danish National Locale escape_char / comment_char % 1 % Danish example national locale for the language Danish 1 % Source: Danish Standards Association 1 % Revision 1.7 1991-05-07 1 LC_CTYPE 1 digit <0>;<1>;<2>;<3>;<4>;<5>;<6>;<7>;<8>;<9> 1 xdigit <0>;<1>;<2>;<3>;<4>;<5>;<6>;<7>;<8>;<9>;/ 1 ;;;;;;;;;;; 1 blank ;; 1 space ;;;;;; 1 upper ;;;;;;;;;;/ 1 ;;;;;

;;;;;/ 1 ;;;;;;;;>;;/ 1 ;;;;;;>;;;;/ 1 >;;;;;;>;;;;/ 1 ;;>;;;;;>;;;/ 1 ;>;;;;>;;;;;/ 1 >;;;;;;;;;;/ 1 ;;;;;>;;;;;/ 1 ;;;;;;;;;;/ 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. F.3 Scope of Danish National Locale 1001 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ;;;;;;;;;;/ 1 ;;>;>;;;;;<'A>;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;>;;;;;;<'B>;<'D>;/ <'G>;<'J>;<'Y>;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;<=">;;<%">;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;; lower ;;;;;;;;;;/ ;;;;;

;;;;;/ ;;;;;;;;;>;/ ;;;;;;;>;;;/ ;>;;;;;;>;;;/ ;;;>;;;;;;>;/ ;;;>;;;;>;;;/ <'n>;;;>;;;;;;;/ ;;;;;;;;;>;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;>;>;;;/ 1 ;<'a>;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;>;;;;;/ ;<'b>;<'d>;<'g>;<'j>;<'y>;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;<='>;;<%'>;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;<*s>;;;;/ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1002 F Sample National Profile Part 2: SHELL AND UTILITIES P1003.2/D11.2 ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;; alpha ;;;;;;;;;;/ ;;;;;

;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;

;;;;;;;;;/ ;;<-->;;;>;;;;;/ ;;;>;;;;>;;;/ ;;;>;;;;;;>;/ ;;;;;;>;;;;/ ;;;;>;;;;>;;/ ;;;;>;;;;;;/ >;;;;;;>;;;;/ >;;;>;;;;>;;;/ ;>;;;;;;>;;;/ ;>;;;;;;;;;/ ;;<'n>;;;>;;;;;/ ;;;;;;;;;;/ ;>;;;;;;;;;/ ;;>;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;>;/ 1 >;;;;;;;;;;/ ;;;;;;;;;;/ ;;;>;>;;;;;;/ 1 <'a>;<'A>;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;>;>;/ ;;;;;;;;;;/ <'b>;<'B>;<'d>;<'D>;<'g>;<'G>;<'j>;<'J>;<'y>;<'Y>;/ ;;;;;;;;;;/ ;;;;;;;;;;/ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. F.3 Scope of Danish National Locale 1003 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;<=">;;<%">;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;<='>;;<%'>;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ <*s>;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;

;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;

;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ 1 ;;;;;;;;;;/ 1 ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 1004 F Sample National Profile Part 2: SHELL AND UTILITIES P1003.2/D11.2 cntrl ;;;;;;;;;;/ ;;;;;;
;;;;/ ;;;;;;;;;;/ ;;
;;;;;;;;/ ;;;;;;;;;;/ ;;;;;;;;;;/ ;;;; punct ;<">;;;<%>;<&>;<'>;<(>;<)>;<*>;/ <+>;<,>;<->;<.>;;<:>;<;>;<<>;<=>;>;/ ;;<<(>;;<)/>>;<'/>>;<_>;<'!>;<(!>;;/ ;<'?>;;;;;;;;<':>;/ ;<-a>;<<<>;;;<'->;;<+->;<2S>;<3S>;/ <''>;;;<.M>;<',>;<1S>;<-o>;/>>;<14>;<12>;/ <34>;;<*X>;<-:>;<'6>;<"6>;<<->;<-!>;<-/>>;<-v>;/ <'9>;<"9>;<'0>;;;;<18>;<38>;<58>;<78>;/ ;<'(>;<';>;<'<>;<'">;<'.>;<;S>;;<1M>;<1N>;/ <3M>;<4M>;<6M>;<1H>;<1T>;<-1>;<-N>;<-2>;<-M>;<-3>;/ <'1>;<'2>;<'3>;<9'>;<9">;<.9>;<:9>;<<1>;1>;<;/ >;<15>;<25>;<35>;<45>;<16>;<13>;<23>;<56>;<*->;/ ;;<-X>;<%0>;;;;;;;/ ;;;;;;;;;;/ <=2>;;<..>;<.3>;<:3>;<.:>;<:.>;<-+>;;<=3>;/ ;;;;<=<>;=>;<0(>;<00>;;<-T>;/ <-L>;<-V>;;;<.P>;;;;;;/ <*P>;<+Z>;;;;;;<(U>;<)U>;<(C>;/ <)C>;<(_>;<)_>;<(->;<-)>;<>;;;<<=>;<=/>>;/ <==>;;
    ;<0u>;<0U>;;<0:>;;;;/ ;;;
    ;
    ;;;<*1>;<*2>;;/ ;;;;

P r . Q s . R t / S u / T v 0 U w 1 V x 2 W y 3 X z 4 Y { 5 Z { 6 [ | 7 \ } 8 \ } 9 ] ~ __________________________________________________________________________________________________________________________________________________ Each symbolic name specified in Table 2-3 shall be included in the file and shall be mapped to a unique encoding value (except for those symbolic 1 names that are shown with identical glyphs). If the control characters 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 62 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 commonly associated with the symbolic names in Table 2-4 are supported by the implementation, the symbolic names and their corresponding encoding values shall be included in the file. Some of the values associated with 1 the symbolic names in this table also may be contained in Table 2-3. 1 Table 2-4 - Control Character Set __________________________________________________________________________________________________________________________________________________ 1 1 1 1 1 1 __________________________________________________________________________________________________________________________________________________ The following declarations can precede the character definitions. Each shall consist of the symbol shown in the following list, starting in column 1, including the surrounding brackets, followed by one of more s, followed by the value to be assigned to the symbol. The name of the coded character set for which the character set description file is defined. The characters of the name shall be taken from the set of characters with visible glyphs defined in 1 Table 2-3. 1 The maximum number of bytes in a multibyte character. This shall default to 1. An unsigned positive integer value that shall define the minimum number of bytes in a character for the encoded character set. The value shall be less than or equal to mb_cur_max. If not specified, the minimum number shall be equal to mb_cur_max. The escape character used to indicate that the characters following shall be interpreted in a special way, as defined later in this subclause. This shall default to backslash (\), which is the character glyph used in all the following text and examples, unless otherwise noted. The character, that when placed in column 1 of a charmap line, is used to indicate that the line shall be ignored. The default character shall be Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.4 Character Set 63 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX the number-sign (#). The character set mapping definitions shall be all the lines immediately following an identifier line containing the string CHARMAP starting in column 1, and preceding a trailer line containing the string END CHARMAP starting in column 1. Empty lines and lines containing a comment_char in the first column shall be ignored. Each noncomment line of the character set mapping definition (i.e., between the CHARMAP and END CHARMAP lines of the file) shall be in either of two forms: "%s %s %s\n", <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_e_n_c_o_d_i_n_g>, <_c_o_m_m_e_n_t_s> or "%s...%s %s %s\n", <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_e_n_c_o_d_i_n_g>, <_c_o_m_m_e_n_t_s> In the first format, the line in the character set mapping definition defines a single symbolic name and a corresponding encoding. A symbolic name is one or more characters from the set shown with visible glyphs in Table 2-3, enclosed between angle brackets. A character following an escape character shall be interpreted as itself; for example, the sequence ``<\\\>>'' represents the symbolic name ``\>'' enclosed between angle brackets. In the second format, the line in the character set mapping definition defines a range of one or more symbolic names. In this form, the symbolic names shall consist of zero or more nonnumeric characters from the set shown with visible glyphs in Table 2-3, followed by an integer formed by one or more decimal digits. The characters preceding the integer shall be identical in the two symbolic names, and the integer formed by the digits in the second symbolic name shall be equal to or greater than the integer formed by the digits in the first name. This shall be interpreted as a series of symbolic names formed from the common part and each of the integers between the first and the second integer, inclusive. As an example, ... is interpreted as the symbolic names , , , and , in that order. A character set mapping definition line shall exist for all symbolic names specified in Table 2-3, and shall define the coded character value that corresponds with the character glyph indicated in the table, or the coded character value that corresponds with the control character symbolic name. If the control characters commonly associated with the symbolic names in Table 2-4 are supported by the implementation, the symbolic name and the corresponding encoding value shall be included in the file. Additional unique symbolic names may be included. A coded character value can be represented by more than one symbolic name. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 64 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The encoding part shall be expressed as one (for single-byte character 1 values) or more concatenated decimal, octal, or hexadecimal constants in 1 the following formats: "%cd%d", <_e_s_c_a_p_e__c_h_a_r>, <_d_e_c_i_m_a_l _b_y_t_e _v_a_l_u_e> "%cx%x", <_e_s_c_a_p_e__c_h_a_r>, <_h_e_x_a_d_e_c_i_m_a_l _b_y_t_e _v_a_l_u_e> "%c%o", <_e_s_c_a_p_e__c_h_a_r>, <_o_c_t_a_l _b_y_t_e _v_a_l_u_e> Decimal constants shall be represented by two or three decimal digits, 2 preceded by the escape character and the lowercase letter d; for example, 2 \d05, \d97, or \d143. Hexadecimal constants shall be represented by two 2 hexadecimal digits, preceded by the escape character and the lowercase 2 letter x; for example, \x05, \x61, or \x8f. Octal constants shall be 2 represented by two or three octal digits, preceded by the escape 2 character; for example, \05, \141, or \217. In a portable charmap file, 2 each constant shall represent an 8-bit byte. Implementations supporting 2 other byte sizes may allow constants to represent values larger than 2 those that can be represented in 8-bit bytes, and to allow additional 2 digits in constants. When constants are concatenated for multibyte 2 character values, they shall be of the same type, and interpreted in byte 2 order from left to right. The manner in which constants are represented 2 in the character is implementation defined. Omitting bytes from a 2 multibyte character definition produces undefined results. 2 In lines defining ranges of symbolic names, the encoded value is the value for the first symbolic name in the range (the symbolic name preceding the ellipsis). Subsequent symbolic names defined by the range shall have encoding values in increasing order. For example, the line ... \d129\d254 shall be interpreted as \d129\d254 \d129\d255 \d130\d0 \d130\d1 The comment is optional. For the interpretation of the dollar-sign and the number-sign, see 2.2.2.37 and 2.2.2.93. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.4 Character Set 65 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.4.2 Character Set Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The portable character set is listed in full so there is no dependency on the ISO/IEC 646 {1} (or historically ASCII) encoded character set, although the set is identical to the characters defined in the International Reference Version of ISO/IEC 646 {1}. This standard poses no requirement that multiple character sets or code sets be supported, leaving this as a marketing differentiation for implementors. Although multiple _c_h_a_r_m_a_p files are supported, it is the responsibility of the implementation to provide the file(s); if only one is provided, only that one will be accessible using the localedef utility's -f option (although in the case of just one file on the system, -f is not useful). The statement about invariance in code sets for the portable character set is worded as it is to avoid precluding implementations where multiple incompatible code sets are available (say, ASCII and EBCDIC). The standard utilities cannot be expected to produce predictable results if they access portable characters that vary on the same implementation. The character set description file provides: - the capability to describe character set attributes (such as collation order or character classes) independent of character set encoding, and using only the characters in the portable character set. This makes it possible to create ``generic'' localedef source files for all code sets that share the portable character set (such as the ISO 8859 family or IBM Extended ASCII). - standardized symbolic names for all characters in the portable character set, making it possible to refer to any such character regardless of encoding. Implementations are free to describe more than one code set in a character set description file, as long as only one encoding exists for the characters in Table 2-3. For example, if an implementation defines ISO 8859-1 {5} as the primary code set, and ISO 8859-2 {6} as an alternate set, with each character from the alternate code set preceded in data by a shift code, a character set description file could contain a complete description of the primary set and those characters from the secondary that are not identical, the encoding of the latter including the shift code. Implementations are free to choose their own symbolic names, as long as the names identified by this standard are also defined; this provides support for already existing ``character names.'' Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 66 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The names selected for the members of the portable character set follow the ISO 8859 {5} and the ISO/IEC 10646 {B11} standards. However, several commonly used UNIX system names occur as synonyms in the list: - The traditional UNIX system names are used for control characters. - The word ``slash'' is in addition to ``solidus.'' 1 - The word ``backslash'' is in addition to ``reverse-solidus.'' 1 - The word ``hyphen'' in addition to ``hyphen-minus.'' - The word ``period'' in addition to ``full-stop.'' - For the digits, the word ``digit'' is eliminated. - For letters, the words ``Latin Capital Letter'' and ``Latin Small Letter'' are eliminated. - The words ``left-brace'' and ``right-brace'' in addition to ``left-curly-bracket'' and ``right-curly-bracket.'' - The names of the digits are preferred over the numbers, to avoid possible confusion between ``0'' and ``O'', and between ``1'' and ``l'' (one and the letter ell). The names for the control characters in Table 2-4 were taken from ISO 4873 {4}. The charmap file was introduced to resolve problems with the portability of, especially, localedef sources. This standard assumes that the 1 portable character set is constant across all locales, but does not 1 prohibit implementations from supporting two incompatible codings, such 1 as both ASCII and EBCDIC. Such ``dual-support'' implementations should 1 have all charmaps and localedef sources encoded using one portable 1 character set, in effect ``cross-compiling'' for the other environment. 1 Naturally, charmaps (and localedef sources) are only portable without 1 transformation between systems using the same encodings for the portable 1 character set. They can, however, be transformed between two sets using 1 only a subset of the actual characters (the portable set). However, the 1 particular coded character set used for an application or an 1 implementation does not necessarily imply different characteristics or collation: on the contrary, these attributes should in many cases be identical, regardless of code set. The charmap provides the capability to define a common locale definition for multiple code sets (the same localedef source can be used for code sets with different extended characters; the ability in the charmap to define ``empty'' names allows for characters missing in certain code sets). Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.4 Character Set 67 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX In addition, several implementors have expressed an interest in using the charmap concept to provide the information required for support of multiple character sets. Examples of such information is encoding mechanism, string parsing rules, default font information, etc. Such extensions are not described here. The declaration was added at the request of the international community to ease the creation of portable _c_h_a_r_m_a_p files on terminals not implementing the default backslash escape. (This approach was adopted because this is a new interface invented by POSIX.2. Historical interfaces, such as the shell command language and awk, have not been modified to accommodate this type of terminal.) The declaration was added at the request of the international community to eliminate the potential confusion between the number sign and the pound sign. The octal number notation with no leading zero required was selected to 1 match those of awk and tr and is consistent with that used by localedef. 1 To avoid confusion between an octal constant and the backreferences used 1 in localedef source, the octal, hexadecimal, and decimal constants must 1 contain at least two digits. As single-digit constants are relatively 1 rare, this should not impose any significant hardship. Each of the 1 constants includes ``two or more'' digits to account for systems in which 1 the byte size is larger than eight bits. For example, a Unicode system 1 that has defined 16-bit bytes may require six octal, four hexadecimal, 1 and five decimal digits. 1 The decimal notation is supported because some newer international standards define character values in decimal, rather than in the old column/row notation. The charmap identifies the coded character sets supported by an implementation. At least one charmap must be provided, but no implementation is required to provide more than one. Likewise, implementations can allow users to generate new charmaps (for instance for a new version of the 8859 family of coded character sets), but does not have to do so. If users are allowed to create new charmaps, the system documentation must describe the rules that apply (for instance: ``only coded character sets that are supersets of ISO/IEC 646 {1} IRV, no multibyte characters, etc.'') END_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 68 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 2.5 Locale A _l_o_c_a_l_e is the definition of the subset of a user's environment that depends on language and cultural conventions. It is made up from one or more categories. Each category is identified by its name and controls specific aspects of the behavior of components of the system. Category names correspond to the following environment variable names: LC_CTYPE Character classification and case conversion. LC_COLLATE Collation order. LC_TIME Date and time formats. LC_NUMERIC Numeric, nonmonetary formatting. LC_MONETARY Monetary formatting. LC_MESSAGES Formats of informative and diagnostic messages and interactive responses. Conforming implementations shall provide the standard utilities and the 1 interfaces in Annex B (if that option is supported) with the capability 1 to modify their behavior based on the current locale, as defined in the 1 Environment Variables subclause for each utility and interface. 1 Locales other than those supplied by the implementation can be created via the localedef utility (see 4.35), provided that the {POSIX2_LOCALEDEF} symbol is defined on the system; see 2.13.2. Otherwise, only the implementation-provided locale(s) can be used. The input to the utility is described in 2.5.2. The value that shall be used to specify a locale when using environment variables shall be the string specified as the _n_a_m_e operand to the localedef utility when the locale was created. The strings "C" and "POSIX" are reserved as identifiers for the POSIX Locale (see 2.5.1.) When the value of a locale environment variable begins with a slash (/), it shall be interpreted as the pathname of the locale definition. If the value of the locale value does not begin with a slash, the mechanism used to locate the locale is implementation defined. If different character sets are used by the locale categories, the results achieved by an application utilizing these categories is undefined. Likewise, if different code sets are used for the data being processed by interfaces whose behavior is dependent on the current locale, or the code set is different from the code set assumed when the locale was created, the result is also undefined. BEGIN_RATIONALE Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 69 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX 2.5.0.1 Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The description of locales is based on work performed in the UniForum Technical Committee Subcommittee on Internationalization. Wherever appropriate, keywords were taken from the C Standard {7} or the _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e {B31}. The value that shall be used to specify a locale when using environment variables is the name specified as the _n_a_m_e operand to the localedef utility when the locale was created. This provides a verifiable method to create and invoke a locale. The ``object'' definitions need not be portable, as long as ``source'' definitions are. Strictly speaking, ``source'' definitions are portable only between implementations using the same character set(s). Such ``source'' definitions can, if they use symbolic names only, easily be ported between systems using different code sets as long as the characters in the portable character set (Table 2-3) have common values between the code sets; this is frequently the case in historical implementations. Of course, this requires that the symbolic names used for characters outside the portable character set are identical between character sets. The definition of symbolic names for characters is outside the scope of this standard, but is certainly within the scope of other standards organizations. When such names are standardized, future versions of POSIX.2 should require the use of these names. Applications can select the desired locale by invoking the _s_e_t_l_o_c_a_l_e() function (or equivalent) with the appropriate value. If the function is invoked with an empty string, the value of the corresponding environment variable is used. If the environment variable is unset or is set to the empty string, the implementation sets the appropriate environment as defined in 2.6. END_RATIONALE 2.5.1 POSIX Locale Conforming implementations shall provide a _P_O_S_I_X _L_o_c_a_l_e. The behavior of standard utilities in the POSIX Locale shall be as if the locale was defined via the localedef utility with input data from Table 2-5, Table 2-7, Table 2-9, Table 2-10, Table 2-8, and Table 2-11, all in 2.5.2. The tables describe the characteristics and behavior of the POSIX Locale for data consisting entirely of characters from the portable character set in Table 2-3 and the control characters in Table 2-4. For characters other than those in the two tables, the behavior is unspecified. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 70 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 The POSIX Locale can be specified by assigning the appropriate environment variables the values "C" or "POSIX". Table 2-5 shows the definition for the LC_CTYPE category. Table 2-7 shows the definition for the LC_COLLATE category. Table 2-8 shows the definition for the LC_MONETARY category. Table 2-9 shows the definition for the LC_NUMERIC category. Table 2-10 shows the definition for the LC_TIME category. Table 2-11 shows the definition for the LC_MESSAGES category. BEGIN_RATIONALE 2.5.1.1 POSIX Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The POSIX Locale is equal to the "C" locale, as specified in POSIX.1 {8}. To avoid being classified as a C-language function, the name has been changed to the _P_O_S_I_X _L_o_c_a_l_e; the environment variable value can be either "POSIX", or, for historical reasons, "C". The POSIX definitions mirror the historical UNIX system behavior. The use of symbolic names for characters in the tables does not imply that the POSIX Locale must be described using symbolic character names, but merely that it may be advantageous to do so. Implementations must define a locale as the ``default'' locale, to be invoked when no environment variables are set, or set to the empty string. This default locale can be the POSIX Locale or any other, implementation-defined locale. Some implementations may provide facilities for local installation administrators to set the default locale, customizing it for each location. This standard does not require such a facility. 1 END_RATIONALE 1 2.5.2 Locale Definition The capability to specify additional locales to those provided by an implementation is optional (see 2.13.2). If the option is not supported, only implementation-supplied locales are available. Such locales shall be documented using the format specified in this clause. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 71 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX Locales can be described with the file format presented in this subclause. The file format is that accepted by the localedef utility (see 4.35). For the purposes of this subclause, the file is referred to as the _l_o_c_a_l_e _d_e_f_i_n_i_t_i_o_n _f_i_l_e, but no locales shall be affected by this file unless it is processed by localedef or some similar mechanism. Any 1 requirements in this subclause imposed upon ``the utility'' shall apply 1 to localedef or to any other similar utility used to install locale 1 information using the locale definition file format described here. 1 The locale definition file shall contain one or more locale category source definitions, and shall not contain more than one definition for the same locale category. If the file contains source definitions for more than one category, implementation-defined categories, if present, shall appear after the categories defined by this clause (2.5). A category source definition shall contain either the definition of a category or a copy directive. For a description of the copy directive, see 4.35. In the event that some of the information for a locale category, as specified in this standard, is missing from the locale source definition, the behavior of that category, if it is referenced, is unspecified. A category source definition shall consist of a category header, a category body, and a category trailer. A category header shall consist of the character string naming of the category, beginning with the characters LC_. The category trailer shall consist of the string END, 1 followed by one or more s and the string used in the corresponding 1 category header. The category body shall consist of one or more lines of text. Each line shall contain an identifier, optionally followed by one or more operands. Identifiers shall be either keywords, identifying a particular locale element, or collating elements. In addition to the keywords defined in this standard, the source can contain implementation-defined keywords. Each keyword within a locale shall have a unique name (i.e., two categories cannot have a commonly-named keyword); no keyword shall start with the characters LC_. Identifiers shall be separated from the operands by one or more s. Operands shall be characters, collating elements, or strings of characters. Strings shall be enclosed in double-quotes. Literal 1 double-quotes within strings shall be preceded by the <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r>, 1 described below. When a keyword is followed by more than one operand, 1 the operands shall be separated by semicolons; s shall be allowed before and/or after a semicolon. The first category header in the file can be preceded by a line modifying the comment character. It shall have the following format, starting in column 1: Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 72 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 "comment_char %c\n", <_c_o_m_m_e_n_t _c_h_a_r_a_c_t_e_r> The comment character shall default to the number-sign (#). Blank lines and lines containing the <_c_o_m_m_e_n_t _c_h_a_r> in the first position shall be ignored. The first category header in the file can be preceded by a line modifying the escape character to be used in the file. It shall have the following format, starting in column 1: "escape_char %c\n", <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r> The escape character shall default to backslash, which is the character used in all examples shown in this standard. A line can be continued by placing an escape character as the last character on the line; this continuation character shall be discarded 1 from the input. Although the implementation need not accept any one 1 portion of a continued line with a length exceeding {LINE_MAX} bytes, it 1 shall place no limits on the accumulated length of the continued line. 1 Comment lines shall not be continued on a subsequent line using an 1 escaped . Individual characters, characters in strings, and collating elements 2 shall be represented using symbolic names, as defined below. In 2 addition, characters can be represented using the characters themselves, 2 or as octal, hexadecimal, or decimal constants. When nonsymbolic 2 notation is used, the resultant locale definitions need not be portable 2 between systems. The left angle bracket (<) is a reserved symbol, 2 denoting the start of a symbolic name; when used to represent itself it 2 shall be preceded by the escape character. The following rules apply to 2 character representation: 2 (1) A character can be represented via a symbolic name, enclosed 2 within angle brackets (< and >). The symbolic name, including 2 the angle brackets, shall exactly match a symbolic name defined 2 in the charmap file specified via the localedef -f option, and 2 shall be replaced by a character value determined from the value 2 associated with the symbolic name in the charmap file. The use 2 of a symbolic name not found in the _c_h_a_r_m_a_p file shall 1 constitute an error, unless the category is LC_CTYPE or LC_COLLATE, in which case it shall constitute a warning condition (see localedef in 4.35 for a description of action resulting from errors and warnings). The specification of a symbolic name in a collating-element or collating-symbol clause that duplicates a symbolic name in the charmap file (if present) is an error. Use of the escape character or a right angle bracket within a symbolic name shall be invalid unless the character is preceded by the escape character. Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 73 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX _E_x_a_m_p_l_e: ; "" (2) A character can be represented by the character itself, in which 2 case the value of the character is implementation defined. 2 Within a string, the double-quote character, the escape 2 character, and the right angle bracket character shall be 2 escaped (preceded by the escape character) to be interpreted as 2 the character itself. Outside strings, the characters 2 , ; < > _e_s_c_a_p_e__c_h_a_r 2 shall be escaped to be interpreted as the character itself. 2 _E_x_a_m_p_l_e: c B "May" (3) A character can be represented as an octal constant. An octal 2 constant shall be specified as the escape character followed by 1 two or more octal digits. Each constant shall represent a byte 1 value. Multibyte characters can be represented by concatenated constants. _E_x_a_m_p_l_e: \143;\347;\143\150 "\115\141\171" (4) A character can be represented as a hexadecimal constant. A 2 hexadecimal constant shall be specified as the escape character 2 followed by an x followed by two or more hexadecimal digits. 1 Each constant shall represent a byte value. Multibyte characters can be represented by concatenated constants. _E_x_a_m_p_l_e: \x63;\xe7;\x63\x68 "\x4d\x61\x79" (5) A character can be represented as a decimal constant. A decimal 2 constant shall be specified as the escape character followed by 2 a d followed by two or more decimal digits. Each constant shall 1 represent a byte value. Multibyte values can be represented by concatenated constants. _E_x_a_m_p_l_e: \d99;\d231;\d99\d104 "\d77\d97\d121" Implementations may accept single-digit octal, decimal, or hexadecimal 1 constants following the escape character. Only characters existing in 1 the character set for which the locale definition is created shall be 1 specified, whether using symbolic names, the characters themselves, or 1 octal, decimal, or hexadecimal constants. If a charmap file is present, 2 only characters defined in the charmap can be specified using octal, 2 decimal, or hexadecimal constants. Symbolic names not present in the 2 charmap file can be specified and shall be ignored, as specified under 2 item (1) above. 2 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 74 2 Terminology and General Requirements Part 2: SHELL AND UTILITIES P1003.2/D11.2 BEGIN_RATIONALE 2 2.5.2.0.1 Locale Definition Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) The decision to separate the file format from the localedef utility 1 description was only partially editorial. Implementations may provide 1 other interfaces than localedef. Requirements on ``the utility,'' mostly 1 concerning error messages, are described in this way because they are 1 meant to affect the other interfaces implementations may provide as well 1 as localedef. (This is similar to the philosophy used by POSIX.1 {8} 1 where the descriptions of the tar and cpio file formats impose 1 requirements on any utilities processing them.) 1 The text about {POSIX2_LOCALEDEF} does not mean that internationalization is optional; only that the functionality of the localedef utility is. Regular expressions, for instance, must still be able to recognize e.g., character class expressions such as [[:alpha:]]. A possible analogy is with an applications development environment: while all conforming implementations must be capable of executing applications, not all need to have the development environment installed. The assumption is that the capability to modify the behavior of utilities (and applications) via locale settings must be supported. If the localedef utility is not present, then the only choice is to select an existing (presumably implementation-documented) locale. An implementation could, for example, chose to support only the POSIX Locale, which would in effect limit the amount of changes from historical implementations quite drastically. The localedef utility is still required, but would always terminate with an exit code indicating that no locale could be created. Supported locales must be documented using the syntax defined in 2.5. (This ensures that users can accurately determine what capabilities are provided. If the implementation decides to provide additional capabilities to the ones in 2.5, that is already provided for.) If the option is present (i.e., locales can be created), then the localedef utility must be capable of creating locales based on the syntax and rules defined in 2.5. This does not mean that the implementation cannot also provide alternate means for creating locales. The octal, decimal, and hexadecimal notations are the same employed by 1 the charmap facility (see 2.4.1). To avoid confusion between an octal 1 constant and a backreference, the octal, hexadecimal, and decimal 1 constants must contain at least two digits. As single-digit constants 1 are relatively rare, this should not impose any significant hardship. 1 Each of the constants includes ``two or more'' digits to account for 1 systems in which the byte size is larger than eight bits. For example, a 1 Unicode system that has defined 16-bit bytes may require six octal, four 1 Copyright c 1991 IEEE. All rights reserved. This is an unapproved IEEE Standards Draft, subject to change. 2.5 Locale 75 P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX hexadecimal, and five decimal digits. 1 This standard is intended as an international (ISO/IEC) standard as well 1 as an IEEE standard, and must therefore follow the ISO/IEC guidelines. 1 One such rule is that characters outside the invariant part of 1 ISO/IEC 646 {1} should not be used in portable specifications. The 1 backslash character is not in the invariant part; the number-sign is, but 1 with multiple representations: as a number-sign and as a pound sign. As 1 far as general usage of these symbols, they are covered by the 1 ``grandfather clause,'' but for newly defined interfaces, ISO has 1 requested that POSIX provides alternate representations. Consequently, 1 while the default escape character remains the backslash, and the default 1 comment character is the number-sign, implementations are required to 1 recognize alternative representations, identified in the applicable 1 source file via the escape_char and comment_char keywords. 1 END_RATIONALE 1 2.5.2.1 LC_CTYPE Table 2-5 - LC_CTYPE Category Definition in the POSIX Locale __________________________________________________________________________________________________________________________________________________ LC_CTYPE # The following is the POSIX Locale LC_CTYPE. # "alpha" is by default "upper" and "lower" # "alnum" is by definition "alpha" and "digit" # "print" is by default "alnum", "punct" and the character # "graph" is by default "alnum" and "punct" # upper ;;;;;;;;;;;;;\ ;;