A2E::SArb - Parsing a String into a Syntactic Tree, recursively defined as a reference to a list of trees


Parse and transform lines of text that contain nested parenthesese, function calls and the like, into semantic tree structures of arbitrarily many levels of hierarchy.

Analyse text where regular expressions are not good enough and generating a full-fledged Yacc-based parser would be overkill.

Usually this is done by writing a subclass to A2E::SArb, see e.g. A2E::SArb::Make and A2E::SArb::MLHT.


        use A2E::SArb::Make;

        my $vars = {};
        my $p = new_ready A2E::SArb::Make vars => $vars;
        $vars->{lib} = 'A2E::Prog';
        $p->transform('my library is ${lib}')
            => my library is A2E::Prog

Of course much more complicated nested parsing is possible, see e.g. the A2E::SArb::Make application which imitates and extends the syntax of GNU Make.


Download Source





Similar CPAN Modules


This allows plugins but is line-based and carries a lot of baggage that made it seem less likely to be efficient for the applications envisaged here. The others are again probably overkill for the purpose of line-to-tree parsing envisaged here.

Parse::Yapp, Parse::Eyapp
Parse::YYLex, perl-byacc

perl-byacc is not on CPAN as it is a C-based yacc compiler that produces perl programs.


    Hartmut Pilch


    Copyright (c) 2008 Hartmut Pilch (phm at a2e de)

    This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Constructor parts

Function sarb_defvars

Function defvars

Function postkonfig

Function add_parser

Register a parser, i.e. a function that will try to find a syntactic expression at a given position and return the new position, to which has advanced if successful, and the syntactic expression ($arb) which it found.

Currently 'try_paren' and 'try_keyvals' are defined here, and 'make' is defined in A2E::SArb::Make.

Function switch_parser

    $m->switch_parser('paren', 1)
    $m->switch_parser('paren', 0)

activate and deactivate a parser

Parser Functions

Function try_parse

Function add_parse

Try to find a syntactic expression at the position p and return

Function add_subarb

Function string

Function substring

Function try_paren

Function skip

Function skip_non

Function try_keyvals

Find an expression like

    a=1 b="2.0"

and render it as [ 'keyvals', 'a', '1', 'b', '2.0' ].

This is [probably] designed for Deplate and not useful for MLHT.

Function parse_from

arg1 $p position in string to start from =item ret1 $p position at which we arrived =item ret2 $arb resulting syntactic tree

Rendering Subroutines

Function getval

Return a pre-stored variable value; common subroutine of getfun and getarb.

    args := typ, nom, vars
    rets := val
    typ => 'arbs' or 'vars' or 'funs', default 'vars'
    typ :: name of global symbol table to search in
    nom => 'url'
    nom :: variable name
    vars => { url => '' }
    vars :: local symbol table
    val :: string value
    val => ''

Function getvar

Return stored variable value arg1 is key, arg2 is local symbol table (hashref)

Function getarb

Return the already-parsed arb corresponding to nom or parse+store+return it.

    args := nom, vars?
    rets := arb
    arb :: tree structure corresponding to a dynamic variable (function)

Function getfun

arg1 is key, arg2 is local symbol table (hashref)

Function maptrafo

Help deal with constructions like

    @tr = |pet|${TRIZ}|ahurl+${url}|

i.e. evaluate variables therein and allow recursion into multilayer structure on the last function argument, which is dealt with as lastfun in pred.

Function getfuns

This takes the ahurl verb in a construction like


or, after definition of

   oam2osm := $(fill oam2osmS,$(s|ahurl|../spez))

the locally defined s verb in the value of oas2osmS as first argument and returns a corresponding Perl function with its initial arguments.

    args := verb, vars
    verb #= 'ahurl' or 's' from examples above
    vars := hashref((nom => lst)+)
    rets := funkorp, funarg+
    funkorp := perl_function_code
    funarg := commandline arguments of funkorp that come before any user-supplied command arguments


Function parse

analyse a string into a tree/arb

Rendering Functions

Function pred

Recurse into a potentially multilevel last argument as in

    @tr = |pet|${TRIZ}|ahurl+${url}|
    @shout = |underline|+boldface+ahurl/${url}+|

such that the text arguments of this function are applied to the inner-most level.

This is a subroutine of pred which is supported by subroutine maptrafo of getfuns.

Function render

Render the syntactic tree arg2 into a string in the target format, using the local definitions found hashref arg1 as well as the global ones found in the namespaces $m->{vars}, $m->{arbs} and $m->{funs} of the object.

This is recursively invoked by many of the command verbs.

Function transform

Parse and render: like render, but starts from a string (in the source format) rather than a tree as first argument. Do this only once.

Function retransform

Parse and render: like render, but starts from a string (in the source format) rather than a tree as first argument. Keep transforming until the result string is the same as the source string, i.e. until there is nothing more to parse.

This is now being used experimentally in the _* variables aka dynavars of A2E::Tmplfil

Function miniverb

Used in directives like

  itemlist_join = $(aresto|${BRA}list type=ul:|1|join|:call:ul_item|1|${ARB})
  special miniverb ul itemlist_join

to store the special short form 'ul' of verb 'itemlist_join' in a hashref $m->{miniverbs} that is consulted (by getfuns) when a function is applied to text, allowing users to put some special verbs into a namespace where they do not interfere with lingual text chunks (lits). The local miniverb definitions performed by the fill function work in the same way and take precedence over the global miniverb definitions performed here. However the global miniverbs are pre-compiled into a tree and thus execute faster. This also means that in order to change the verb's meaning later, the miniverb assignement also has to be invoked once more.

This requires that function $m->{funs_intern}->{arb_args_call} is defined by the implementing library.


Support for something like the following

                ul = <itemizedlist><title>${1}</title>${2@<listitem><para>@</para></listitem>}</itemizedlist>

will require an extension of &A2E::SArb::Make::pars_make, &A2E::SArb::Make::render_call and possibly &A2E::SArb::render, see A2E::SArb::Make(3) for more.



Hey! The above document had some coding errors, which are explained below:

Around line 221:

=over should be: '=over' or '=over positive_number'

Around line 256:

You forgot a '=back' before '=head3'

Around line 688:

=back without =over