During recent weeks the production system by which I have been exporting concrete documents in multiple languages and formats from abstract data representations has regained vitality. While I plan and implement new functionalities, I use the system to document the old ones. Here’s a glimpse of what’s going on. |
||
![]()
Mechanisms for Subgrouping within a lineNormally each text block occupies one line. This way we achieve maximal overwritability that we need for multilinguality/pluriversionality. Occasionally however we do want to cram several fields into one line. For this we provide two mechanisms. The modifier “+1” behind the format structure variable says that the line content is a 1-dimensional list structure. As with variable definitions, multidimensional structures are possible, the syntax is the same. @tab = /proc+table/proc+row+lvlmax+1/ (_tb @tab |a|1| |b|2| ) The modifier “+tabregex” points to a textchunk variable mrex = \\R\{(.*)\}\{(.*)\} @tab = /proc+table/proc+row+match+mrex/ (_tb @tab \R{a}{1} \R{b}{2} ) With the modifier +split+srex we point to the text variable srex whose value is a perl regex like s*&s* with which as argument the line content is separated into sublines using the perl function srex = \s*\&\s* @tab = /proc+table/proc+row+split+srex/ (_tb @tab a & 1 a & 2 ) With the modifier +split+srex we point to the text variable srex whose value is a perl regex like with which as argument the line content is separated into sublines using the perl function lrex = ([.?!:])\s+([[:upper:]]) @sent = /proc+lines+lrsplit+lrex/ (_tb @sent Odi et amo. Quare id faciam? Fortasse requiris. Nescio! Sed fieri sentio. Et excrucior! ) Turn MLHT text into a database source formatWe have legacy notation in /sig/oas/15/01/spez/_dok.oas_spez1501.txt and the like that is dependent on Deplate and activated with person id=lyre nom="Lyre" typ=frm rol=adr lok=ham plz=24057 str="Bierbauch" dom=19 mail="sme@lyre.com" person id=ldapaper nom="LDA Paper UK LLP" typ=frm rol=adr lok=fra plz=63207 str="Nordkaiplatz" dom=1 mail="fabius.fairmayor@ldapaper.com" ag dok=memoS1 nom="Memorandum Quintus Baum GmbH Deutsch-Chinesisch" des="90 Zeilen a 1,40 EUR" mon=126 ust=00 odat=2015-01-08 fdat=2015-01-12 de=lingoserv rkod="Auftrag 1501023" status=f:Rechnung stellen. ag dok=imibS1 nom="Gesamt" des="ADV-Dokumente ins Chinesische und Japanische" odat=2015-01-09 fdat=2015-01-13 rkod="" de=lingoserv pre=imibS1b status=f:Rechnung stellen. Replace this with something like (_adr @ul//adrdb (lyre dabagrup = adrdb typ = frm rol = adr lok = ham plz = 24057 str = Bierbauch dom = 19 mail = sme@lyre.com Lyre ) (ldapaper/adrdb dabagrup = adrdb typ = frm rol = adr lok = fra plz = 63207 str = Nordkaiplatz dom = 1 mail = fabius.fairmayor@ldapaper.com LDA Paper UK LLP ) ) and (_spz @ul//spzdb (memoS1 dabagrup = spzdb %flds = ||mon|126||ust|00||odat|2015-01-08||fdat|2015-01-12||de|lingoserv||rkod|Auftrag 1501023||status|f| Memorandum Quintus Baum GmbH Deutsch-Chinesisch 90 Zeilen a 1,40 EUR ) (imibS1 dabagrup = spzdb ||odat|2015-01-09||fdat|2015-01-13||rkod|||de|lingoserv||pre|imibS1b||status|f| Gesamt ADV-Dokumente ins Chinesische und Japanische ) ) such that the program dokdata2db program can then from the dbm file find internal special textchunks such as _dabagrups_ = +spzdb+adrdb+ _dabagrup_spzdb_rellits_ = +imibS1+memoS1+ _dabagrup_adrdb_rellits_ = +ldapaper+lyre+ Going on from there, dokdata2db should be able use rellits2putrek or similar to write the subelements of the identified rellits to the database. The _dabagrup_ info |+person+adr+tel|+tit+des+| consists of two lists: (1) names of involved dabarels, (2) shorthands of the fields that constitute initial lines of the textchunk body. To simplify, we first specify the data that each record uses with an attribute field (imibS1 @ll/spzdb odat = 2015-01-09 fdat = 2015-01-13 rkod = de = lingoserv pre = imibS1b status =f Gesamt ADV-Dokumente ins Chinesische und Japanische ) or, for additional notational simplicity, we allow specification of attributes by hash notation alongside with the plain attribute notation. The hash notation would imply that we are specifying final values that are not subjected to (_spz @ll/spzdb ||odat|2015-01-09||fdat|2015-01-13||de|lingoserv||pre|imibS1b||status|f| rkod = Gesamt ADV-Dokumente ins Chinesische und Japanische ) For the parent section we would specify an shorthand of sub_dabagrup by an extra slash, suggesting an extra hierarchy level. (_adr @ll//adrdb (memoS1 ... ) ... ) These would set the internal dabagrup attribute or dabagrup attribute respectively. This attribute would then trigger the pushing of the lit onto the internal _dabagrup__rellits_ list at the end of the Support migration of contentsCertain sections whose contents might move elsewhere must be referencable as a document_id, also with document id URI, even though they are not documents. The anchor symbol of the living version of the section would be marked with an asterisk suffix behind which a further possible prependable part could be added, which fould form the dok URI, e.g. Short form of group opener without spaceThere seems to be no reason for separating the group opening bracket and section anchor by a whitespace from the following formatter variable argument. Especially in the normal cases where no further arguments exist, a group opening without whitespace would look more elegant. (@vrb (_v@vrb+1 (_tb@tab+tabregex (oas_adr*@ulsekt//adrdb Make group-local trigger/hook variables definable globally%indproc = ||enumlist|1||itemlist|1||ilinioi|1||minitrivlist|1| %sfx2fun = ||url|+call+ahurlval_verb+||dok|+call+ahdokval_verb+| Inversely it would be desirable to make some variable types, e.g. formatter hierarchy structure variables, that are now defined only globally, also definable group-locally. The Activation of formatters by mere textchunk namingBy means of the following mapping hash we could assure that the section anchor name suffixes _ula and _ulb imply use of the formatter structure @ul so that this formatter would no longer have to be specified in the document. %litjung = ||_vb|@vrb||_ct|@cit||_ol|@ol||_ul|@ul| This way documents could become even leaner. However they would become also less flexible because naming would be burdened with a function. Due to this disadadvantage this feature is of low priority. Building text blocks from external sourcesWir können jetzt schon Textblöcke aus anderswo definierten Textblöcken zusammensetzen. Es gibt dabei noch Bedarf nach weiteren Varianten. Z.B. könnte es nötig sein, die von außen eingelesenen Textblöcke mit einem Aufruf In /adv/perl/A2E/Tmplfil.pm.tmpl#ELGRUP hinzu-entwickeln, was nötig ist, um Dateien wie /sig/oas/_lng.oas.txt gut zum Funktionieren zu bringen. Die in ELGRUP vorgeschlagenen Konstrukte sind oftmals weniger sinnvoll als die unten angeführte Umsetzung mit proc/call-Aufrufen, da letztere keine Textvariablen erzeugt. M.a.W. die ELGRUP-Konstrukte sind nur dann optimal, wenn man die Möglichkeit haben will, in anderen Sprachen Textvariablen zu überschreiben. Solche Situationen dürften eher selten vorkommen. (_lst @ul ! vals) @alin = /proc+alineas/proc+linioi/ (intro @alin ! +swpat+eupla+lisboa+) # warn: for backward compatibility only @litsvar = +swpat+eupla+lisboa+ (intro @alin ! lits) (_lst @ul mapcall) Besser ist vermutlich folgendes (_in /proc+include/ dbmvals_log.txt ) oder folgendes (_ir /call+include_re/ tabregex dbmvals_log.txt ) bzw (_mc /call+mapcall/ ahval_verb valsvar ) ![]() |