Reference Documentation: PDF Publishing with GNU Troff
リビジョン | 1dd113b9870ccdebd88207f30ee752b67f5bde3c (tree) |
---|---|
日時 | 2024-10-11 04:38:50 |
作者 | Keith Marshall <keith@user...> |
コミッター | Keith Marshall |
Improve handling of groff special characters in sanitized text.
* tmac/sanitize.tmac (sanitize:esc-(hy): Redefine as a string, with
value equivalent to a single ASCII hyphen-minus; now used by...
(sanitize:esc(??.subst): ...this new macro; it handles substitutions
for two-character escapes of the form "\(??", on behalf of...
(sanitize:esc-(): ...this; add a sanity check, to ensure that the
input text residual comprises no fewer than two characters, as needed
to complete the "\(??" escape sequence; otherwise simplify, delegating
handling of the entire substitution operation to...
(sanitize:esc-(??.subst): ...this; it is also used by...
(sanitize:esc-[): ...this new macro; it also performs two-character
escape substitution, for the alternative groff "\[??]" form, such that
escapes of both "\(??" and "\[??]" forms are substituted congruently.
(sanitize:esc-(mi, sanitize:esc-(en): New strings; they define
substitutions for "\(mi" and "\(en", respectively; both are aliases...
(sanitize:esc-(hy): ...for this; thus all three result in substitution
of a single ASCII hyphen-minus glyph.
(sanitize:esc-(em): New string; it defines a substitution for the
"\(em" escape, such that it is replaced by a conjoined pair of ASCII
hyphen-minus glyphs.
@@ -81,12 +81,11 @@ | ||
81 | 81 | . \" |
82 | 82 | . rn sanitize:scan.char sanitize:hold |
83 | 83 | . sanitize:scan.execute |
84 | -. ie d sanitize:esc-\\*[sanitize:scan.char] \{\ | |
85 | -. \" | |
86 | -. \" ...which we delegate to its appropriate handler, to skip... | |
87 | -. \" | |
88 | -. sanitize:esc-\\*[sanitize:scan.char] | |
89 | -. \} | |
84 | +. | |
85 | +. \" ...to check for an identifiable escape sequence, which may be | |
86 | +. \" delegated to its appropriate handler... | |
87 | +. \" | |
88 | +. ie d \\*[sanitize:esc-handler] .\\*[sanitize:esc-handler] | |
90 | 89 | . |
91 | 90 | . \" ...but, in the case of an unrecognized escape sequence, we copy |
92 | 91 | . \" its backed-up content, followed by the character retrieved from |
@@ -138,6 +137,11 @@ | ||
138 | 137 | . if '!*[sanitize:scan.char]'\\' .ds sanitize:scan.char "^[\" |
139 | 138 | . ec |
140 | 139 | .. |
140 | +.\" Define a shorthand notation for identification of the internal entry | |
141 | +.\" point names of available escape sequence handlers. | |
142 | +.\" | |
143 | +.ds sanitize:esc-handler "sanitize:esc-\E*[sanitize:scan.char]\" | |
144 | +. | |
141 | 145 | . |
142 | 146 | .\" Filters for Removal and Substitution of Special Tokens |
143 | 147 | .\" ------------------------------------------------------ |
@@ -357,6 +361,122 @@ | ||
357 | 361 | . el .if '\\*[\\$1.try]'\\*[sanitize:scan.char]' .nr \\$1.matched 1 |
358 | 362 | . \} |
359 | 363 | .. |
364 | +. | |
365 | +.\" Special Character Substitutions in Sanitized Text | |
366 | +.\" ------------------------------------------------- | |
367 | +.\" | |
368 | +.\" Special characters, as defined in groff_char(7), will usually not | |
369 | +.\" be rendered reliably within sanitized text; thus, the sanitize macro | |
370 | +.\" supports a simple substitution mechanism, for any speecial character | |
371 | +.\" which is represented by a two-character escape name. Specifically, | |
372 | +.\" any such special character may be mapped, using a notation in the | |
373 | +.\" style derived from their traditional troff "\(cc" representation, | |
374 | +.\" similar the the sanitize:esc-(cc defaults, below. | |
375 | +.\" | |
376 | +.\" Define sanitized substitutions for "\(hy" and "\(em"; map them to | |
377 | +.\" a single ASCII hyphen-minus, and a conjoined pair of hyphen-minus | |
378 | +.\" glyphs, respectively. | |
379 | +.\" | |
380 | +.ds sanitize:esc-(hy - | |
381 | +.ds sanitize:esc-(em -- | |
382 | +. | |
383 | +.\" Translate "\(en" and "\(mi" as equivalents for "\(hy". | |
384 | +.\" | |
385 | +.als sanitize:esc-(en sanitize:esc-(hy | |
386 | +.als sanitize:esc-(mi sanitize:esc-(hy | |
387 | +. | |
388 | +.\" Each of these substitutions is linked into the sanitize macro | |
389 | +.\" processing sequence by one or other of two sanitize:esc-handler | |
390 | +.\" conformantly named macros, sanitize:esc-( and sanitize:esc-[ | |
391 | +.\" | |
392 | +.de sanitize:esc-( | |
393 | +.\" Usage (internal): .sanitize:esc-( | |
394 | +.\" | |
395 | +.\" Handler for interpretation of special character escape sequences | |
396 | +.\" which are expressed in the form "\(??"; the residual MUST comprise | |
397 | +.\" at least two characters, to match this escape sequence format, or | |
398 | +.\" the call is handled as a no-op. | |
399 | +.\" | |
400 | +. if 2>\\n[sanitize:residual.length] .return | |
401 | +.\" | |
402 | +.\" Only when there are sufficient characters available, do we | |
403 | +.\" attempt to interpret the escape sequence; when a substitution | |
404 | +.\" has been defined, for the current input escape sequence, it | |
405 | +.\" replaces the initial two characters of the residual. | |
406 | +.\" | |
407 | +. \\$0??.subst \\$0 2 | |
408 | +.\" | |
409 | +.\" Regardless of whether a substitution occurred, or not, the | |
410 | +.\" preceding call will have created a local look-ahead buffer; | |
411 | +.\" it may be safely discarded. | |
412 | +.\" | |
413 | +. rm \\$0:look-ahead | |
414 | +.. | |
415 | +.de sanitize:esc-[ | |
416 | +.\" Usage (internal): .sanitize:esc-[ | |
417 | +.\" | |
418 | +.\" An analogue for sanitize:esc-(, this handles "\(??" two-character | |
419 | +.\" escapes, which have been expressed in groff's alternative "\[??]" | |
420 | +.\" format; just as sanitize:esc-( requires no fewer than two residual | |
421 | +,\" characters, this requires no fewer than three of them, (one extra, | |
422 | +.\" to match the closing "]"), or it becomes a no-op. | |
423 | +.\" | |
424 | +. if 3>\\n[sanitize:residual.length] .return | |
425 | +.\" | |
426 | +.\" Only when there are sufficient characters available, do we attempt | |
427 | +.\" to interpret the escape sequence; in this case, the third residual | |
428 | +.\" character must be the closing "]"... | |
429 | +.\" | |
430 | +. ds \\$0:look-ahead \\*[sanitize:residual] | |
431 | +. substring \\$0:look-ahead 2 2 | |
432 | +.\" | |
433 | +.\" ...in which case, we may process any substitution which has been | |
434 | +.\" defined, replacing the initial three characters of the residual, | |
435 | +.\" while retaining any which extend beyond this initial three... | |
436 | +.\" | |
437 | +. if '\\*[\\$0:look-ahead]']' .sanitize:esc-(??.subst \\$0 3 | |
438 | +.\" | |
439 | +.\" ...and ultimately, discarding the local look-ahead buffer. | |
440 | +.\" | |
441 | +. rm \\$0:look-ahead | |
442 | +.. | |
443 | +.\" Each of this pair of sanitize:esc-handler macros delegates the | |
444 | +.\" actual substitution to the sanitize:esc-(??.subst macro. | |
445 | +.\" | |
446 | +.de sanitize:esc-(??.subst | |
447 | +.\" Usage (internal): .sanitize:esc-(??.subst caller count | |
448 | +.\" | |
449 | +.\" Perform substitution, if any is specified, for a two-character | |
450 | +.\" escape, which has been expressed in either troff "\(??", or the | |
451 | +.\" groff alternative "\[??]" format; in either case, a substitution | |
452 | +.\" is available, only if a string named sanitize:esc-(?? has been | |
453 | +.\" defined, with its actual "??" suffix matching the first two | |
454 | +.\" characters of the sanitize:residual string. | |
455 | +.\" | |
456 | +. ds \\$1:look-ahead \\*[sanitize:residual] | |
457 | +. substring \\$1:look-ahead 0 1 | |
458 | +. if !d sanitize:esc-(\\*[\\$1:look-ahead] .return | |
459 | +. | |
460 | +.\" If we are still here, then a substitution has been identified; | |
461 | +.\" process it, collecting an updated residual into the designated | |
462 | +.\" look-ahead buffer... | |
463 | +.\" | |
464 | +. ds \\$1:look-ahead \\*[sanitize:esc-(\\*[\\$1:look-ahead]] | |
465 | +. if \\n[sanitize:residual.length]>\\$2 \{\ | |
466 | +. \" ...appending any part of the input residual, which extends | |
467 | +. \" beyond the end of the escape secuence... | |
468 | +. \" | |
469 | +. substring sanitize:residual \\$2 | |
470 | +. as \\$1:look-ahead \\*[sanitize:residual] | |
471 | +. \} | |
472 | +. | |
473 | +.\" ...then ultimately, replacing the original residual with this | |
474 | +.\" updated string value, and recomputing its length. | |
475 | +.\" | |
476 | +. als sanitize:residual \\$1:look-ahead | |
477 | +. length sanitize:residual.length \\*[sanitize:residual] | |
478 | +.. | |
479 | +. | |
360 | 480 | .de sanitize:skip-( |
361 | 481 | .\" Usage (internal): .sanitize:skip-( |
362 | 482 | .\" |
@@ -388,28 +508,6 @@ | ||
388 | 508 | . el .nr sanitize:skip.count 0 |
389 | 509 | . \} |
390 | 510 | .. |
391 | -.de sanitize:esc-( | |
392 | -.\" Usage (internal): .sanitize:esc-( | |
393 | -.\" | |
394 | -.\" Handler for interpretation special character escape sequences, | |
395 | -.\" which are expressed in the form "\(xx". | |
396 | -.\" | |
397 | -. ds sanitize:look-ahead \\*[sanitize:residual] | |
398 | -. substring sanitize:look-ahead 0 1 | |
399 | -. if d \\$0\\*[sanitize:look-ahead] .\\$0\\*[sanitize:look-ahead] | |
400 | -. rm sanitize:look-ahead | |
401 | -.. | |
402 | -.de sanitize:esc-(hy | |
403 | -.\" Usage (internal): .sanitize:esc-(hy | |
404 | -.\" | |
405 | -.\" Handler for translation of "\(hy" special character escapes | |
406 | -.\" within sanitized PDF outline entries; each is replaced by an | |
407 | -.\" (approximately) equivalent ASCII "-" character. | |
408 | -.\" | |
409 | -. substring sanitize:residual 2 | |
410 | -. ds sanitize:residual -\\*[sanitize:residual] | |
411 | -. nr sanitize:residual.length -1 | |
412 | -.. | |
413 | 511 | .de sanitize:esc-generic |
414 | 512 | .\" Usage (internal): .sanitize:esc-X |
415 | 513 | .\" |