Reference Documentation: PDF Publishing with GNU Troff
リビジョン | 5f71b353eed4fe90eb7402e101dc43d89779e053 (tree) |
---|---|
日時 | 2024-10-19 03:02:11 |
作者 | Keith Marshall <keith@user...> |
コミッター | Keith Marshall |
Strip delimited argument escape sequences from sanitized text.
* tmac/sanitize.tmac (sanitize:esc-s, sanitize:esc-v): New macros;
each of these is implemented, internally, as an alias for...
(sanitize:esc-delimited): ...this new generic handler macro.
(sanitize:scan.delimiter.push, sanitize:scan.delimiter.pop): New
helper macros; they are used by...
(sanitize:esc-delimited): ...this, to save, and subsequently restore,
effective escape sequence delimiter context, in...
(sanitize:scan.delimiter.stack): ...this new string.
(sanitize:esc-generic): Refactored; it now uses...
(sanitize:skip-handler): ...this new string; it defines a template,
used for redirection of control to one, or other of...
(sanitize:skip-(, sanitize:skip-[): ...these, as appropriate.
@@ -477,6 +477,63 @@ | ||
477 | 477 | . length sanitize:residual.length \\*[sanitize:residual] |
478 | 478 | .. |
479 | 479 | . |
480 | +.\" Removal of Entire Escape Sequences from Sanitized Text | |
481 | +.\" ------------------------------------------------------ | |
482 | +.\" | |
483 | +.\" There are three classes of escape sequence which may be designated | |
484 | +.\" as candidates for complete removal from any sanitized text string: | |
485 | +.\" those with arguments of explicit length, those with arguments of | |
486 | +.\" arbitrary length, demarcated by brackets, and those with arguments | |
487 | +.\" of arbitrary length, demarcated by matching, arbitrarily chosen | |
488 | +.\" quoting characters. | |
489 | +.\" | |
490 | +.\" Typical of those in the first two classes are "\f...", and "\F...", | |
491 | +.\" these are processed by the sanitize:esc-generic handler macro: | |
492 | +.\" | |
493 | +.de sanitize:esc-generic als | |
494 | +.als sanitize:esc-f sanitize:esc-generic | |
495 | +.als sanitize:esc-F sanitize:esc-generic | |
496 | +. | |
497 | +.\" Conversely, the third class of these escape sequences is typified | |
498 | +.\" by those such as "\s'...'", and "\v'...'"; these are processed by | |
499 | +.\" the alternative sanitize:esc-delimited handler macro: | |
500 | +.\" | |
501 | +.de sanitize:esc-delimited als | |
502 | +.als sanitize:esc-s sanitize:esc-delimited | |
503 | +.als sanitize:esc-v sanitize:esc-delimited | |
504 | +. | |
505 | +.\" The formal implementation of sanitize:esc-generic follows here; | |
506 | +.\" that of sanitize:esc-delimited may be found further below. | |
507 | +.\" | |
508 | +.am sanitize:esc-generic | |
509 | +.\" Usage (internal): .sanitize:esc-X | |
510 | +.\" | |
511 | +.\" (X represents any legitimate single-character escape sequence id). | |
512 | +.\" | |
513 | +.\" Handler for skipping "\X" sequences, in text which is to be sanitized; | |
514 | +.\" this will automatically detect sequences conforming to any of the forms | |
515 | +.\" "\Xc", "\X(cc", or "\X[...]", and will handle each appropriately. The | |
516 | +.\" implementation is generic, and may be aliased to handle any specific | |
517 | +.\" escape sequences, which exhibit similar semantics. | |
518 | +.\" | |
519 | +.\" At least one additional argument character is consumed; this may extend | |
520 | +.\" to further argument characters, when a skip handler is associated with | |
521 | +.\" this initial argument character. | |
522 | +.\" | |
523 | +. sanitize:scan.execute | |
524 | +. if d \\*[sanitize:skip-handler] .\\*[sanitize:skip-handler] | |
525 | +.. | |
526 | +.\" For escape sequences of the "\Xc" form, the above handler has already | |
527 | +.\" consumed the the entire sequence, by calling the sanitize:scan.execute | |
528 | +.\" macro; however, the "\X(cc", and "\X[...]" forms require the additional | |
529 | +.\" services provided by the pair of sub-handlers designated by names which | |
530 | +.\" match this sanitize:skip-handler template: | |
531 | +.\" | |
532 | +.ds sanitize:skip-handler "sanitize:skip-\E*[sanitize:scan.char]\" | |
533 | +. | |
534 | +.\" The first of the sanitize:skip-handler service macros provides the | |
535 | +.\" support required to handle "\X(cc" escape sequences. | |
536 | +.\" | |
480 | 537 | .de sanitize:skip-( |
481 | 538 | .\" Usage (internal): .sanitize:skip-( |
482 | 539 | .\" |
@@ -486,6 +543,9 @@ | ||
486 | 543 | . nr sanitize:residual.length -2 |
487 | 544 | . substring sanitize:residual 2 |
488 | 545 | .. |
546 | +.\" Conversely, the second sanitize:skip-handler macro provides the support | |
547 | +.\" required for handling "\X[...]" escape sequences. | |
548 | +.\" | |
489 | 549 | .de sanitize:skip-[ |
490 | 550 | .\" Usage (internal): .sanitize:skip-[ |
491 | 551 | .\" |
@@ -508,26 +568,134 @@ | ||
508 | 568 | . el .nr sanitize:skip.count 0 |
509 | 569 | . \} |
510 | 570 | .. |
511 | -.de sanitize:esc-generic | |
512 | -.\" Usage (internal): .sanitize:esc-X | |
513 | -.\" | |
514 | -.\" (X represents any legitimate single-character escape sequence id). | |
571 | +. | |
572 | +.\" The implementation of the alternative sanitize:esc-delimited handler | |
573 | +.\" macro, as used for escape sequences such as "\s'...'", and "\v'...'", | |
574 | +.\" follows here. | |
515 | 575 | .\" |
516 | -.\" Handler for skipping "\X" sequences, in text which is to be sanitized; | |
517 | -.\" this will automatically detect sequences conforming to any of the forms | |
518 | -.\" "\Xc", "\X(cc", or "\X[...]", and will handle each appropriately. The | |
519 | -.\" implementation is generic, and may be aliased to handle any specific | |
520 | -.\" escape sequences, which exhibit similar semantics. | |
576 | +.am sanitize:esc-delimited | |
577 | +.\" Usage (internal, as alias): .sanitize:esc-X | |
521 | 578 | .\" |
522 | -. sanitize:scan.execute | |
523 | -. if d sanitize:skip-\\*[sanitize:scan.char] \ | |
524 | -. sanitize:skip-\\*[sanitize:scan.char] | |
579 | +.\" (X represents any legitimate single-character escape sequence id.) | |
580 | +.\" | |
581 | +.\" Handler for skipping escape sequences of the "\X'arg...' form, in | |
582 | +.\" which the sequence accepts an argument bounded by a quoting pair of | |
583 | +.\" arbitrary, but matching delimiter characters. Nesting of similar | |
584 | +.\" sequences is supported, within the quoted argument, regardless of | |
585 | +.\" whether the nested sequence arguments are quoted by the same, or | |
586 | +.\" by distinct delimiter characters. | |
587 | +.\" | |
588 | +.\" Begin by backing up the current input character, acquiring the next, | |
589 | +.\" and pushing it on to the pending delimiter stack. | |
590 | +.\" | |
591 | +. as sanitize:hold \\*[sanitize:scan.char] | |
592 | +. sanitize:scan.delimiter.push | |
593 | +. | |
594 | +.\" Initiate an iterative look-ahead loop, backing up, acquiring, and | |
595 | +.\" analysing successive input characters, until the matching delimiter | |
596 | +.\" is detected. | |
597 | +.\" | |
598 | +. while \\n[sanitize:residual.length] \{\ | |
599 | +. as sanitize:hold \\*[sanitize:scan.char] | |
600 | +. sanitize:scan.execute | |
601 | +. if '\\*[sanitize:scan.char]'\\*[sanitize:scan.delimiter]' \{\ | |
602 | +. | |
603 | +. \" Found the closing delimiter; discard the entire sequence, | |
604 | +. \" scanned up to this point, pop the pending delimiter stack, | |
605 | +. \" and return to the top level scan of any residual input. | |
606 | +. \" | |
607 | +. sanitize:scan.delimiter.pop | |
608 | +. return | |
609 | +. \} | |
610 | +. if '\\*[sanitize:scan.char]'^[' \{\ | |
611 | +. | |
612 | +. \" An embedded escape sequence has been encountered, within | |
613 | +. \" the scope of the delimiters; process it, before resuming | |
614 | +. \" the look-ahead cycle. | |
615 | +. \" | |
616 | +. as sanitize:hold "^[\" | |
617 | +. sanitize:scan.execute | |
618 | +. if d \\*[sanitize:esc-handler] .\\*[sanitize:esc-handler] | |
619 | +. \} | |
620 | +. \} | |
621 | +.\" If here, no closing delimiter has been found; restore the entire | |
622 | +.\" sequence, comprising all backed up content, followed by the last | |
623 | +.\" scanned input character, and return it. | |
624 | +.\" | |
625 | +. as sanitize:result \\*[sanitize:hold]\\*[sanitize:scan.char] | |
525 | 626 | .. |
526 | -.\" Map the generic handler to specific escape sequences, as required. | |
627 | +.\" Augment the base handler, by defining the delimiter stack, its | |
628 | +.\" associated sanitize:scan.delimiter.push, and the complementary | |
629 | +.\" sanitize:scan.delimiter.pop helper macros. | |
527 | 630 | .\" |
528 | -.als sanitize:esc-f sanitize:esc-generic | |
529 | -.als sanitize:esc-F sanitize:esc-generic | |
631 | +.\" Both the sanitize:scan.delimiter.push, and its complementary | |
632 | +.\" sanitize:scan.delimiter.pop macros assume that the associated | |
633 | +.\" sanitize:scan.delimiter.stack string pre-exists; to avoid any | |
634 | +.\" groff warning, ensure that it does, albeit initially empty. | |
530 | 635 | .\" |
636 | +.ds sanitize:scan.delimiter.stack "\" empty string | |
637 | +. | |
638 | +.de sanitize:scan.delimiter.push | |
639 | +.\" Usage (internal): .sanitize:scan.delimiter.push [delimiter list ...] | |
640 | +.\" | |
641 | +.\" Acquire the next available token, from the scanned input text, and | |
642 | +.\" push it on to the pending delimiter stack, whence it represents the | |
643 | +.\" active delimiter for the current level of escape sequence quoting. | |
644 | +.\" | |
645 | +.\" Called by the sanitize:esc-delimited macro, WITHOUT arguments, it | |
646 | +.\" subsequently calls itself, recursively, WITH arguments representing | |
647 | +.\" the entire updated stack content, as a space-separated list of all | |
648 | +.\" pending delimiter tokens; the first argument represents the most | |
649 | +.\" recently pushed, and also the currently pending delimiter. | |
650 | +.\" | |
651 | +. ie \\n[.$] .ds sanitize:scan.delimiter.stack \\$* | |
652 | +. el \{\ | |
653 | +. \" This is the normal macro entry point; acquire the next delimiter | |
654 | +. \" token, and then recurse to push it on to the delimiter stack. | |
655 | +. \" | |
656 | +. sanitize:scan.execute | |
657 | +. \\$0 \\*[sanitize:scan.char] \\*[sanitize:scan.delimiter.stack] | |
658 | +. | |
659 | +. \" Ultimately, assign this new delimiter token as "active". | |
660 | +. \" | |
661 | +. ds sanitize:scan.delimiter \\*[sanitize:scan.char] | |
662 | +. \} | |
663 | +.. | |
664 | +.de sanitize:scan.delimiter.pop | |
665 | +.\" Usage (internal): .sanitize:scan.delimiter.pop [delimiter list ...] | |
666 | +.\" | |
667 | +.\" Complements the sanitize:scan.delimiter.push macro; this discards | |
668 | +.\" the most recently active delimiter token from the top of the stack, | |
669 | +.\" and make the next entry, if any, the new "active" delimiter. | |
670 | +.\" | |
671 | +.\" If there is nothing on the stack, this becomes a no-op. | |
672 | +.\" | |
673 | +. if '\\*[sanitize:scan.delimiter.stack]'' .return | |
674 | +. | |
675 | +.\" Like sanitize:scan.delimiter.push, sanitize:scan.delimiter.pop is | |
676 | +.\" called by sanitize:esc-delimited, WITHOUT arguments; it then calls | |
677 | +.\" itself, recursively, WITH arguments representing the stack content, | |
678 | +.\" prior to removal of its unwanted top entry. | |
679 | +.\" | |
680 | +. ie \\n[.$] \{\ | |
681 | +. \" This is the recursive re-entry; the first argument represents | |
682 | +. \" the unwanted top stack entry, so shift it out of the way, and | |
683 | +. \" reassign the stack, to represent only any remaining entries. | |
684 | +. \" | |
685 | +. shift | |
686 | +. ds sanitize:scan.delimiter.stack "\\$*\" | |
687 | +. | |
688 | +. \" Additionally, assign the new top-of-stack entry, as the new | |
689 | +. \" actively pending delimiter token. | |
690 | +. \" | |
691 | +. ds sanitize:scan.delimiter \\$1 | |
692 | +. \} | |
693 | +.\" Alternatively, when called normally, without arguments, simply | |
694 | +.\" recurse, passing the current stack contents as arguments. | |
695 | +.\" | |
696 | +. el .\\$0 \\*[sanitize:scan.delimiter.stack] | |
697 | +.. | |
698 | +. | |
531 | 699 | .\" Local Variables: |
532 | 700 | .\" mode: nroff |
533 | 701 | .\" End: |