Jump to content

Talk:Straight-line grammar

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Renaming

[edit]

I would like to rename this article into "Straight-line Grammars" and possibly merge it with article Smallest grammar problem. "Context-free grammar generation algorithms" is misleading as it seems to suppose strong links with grammatical inference (there are links, but not so strong). Anyone against?

Mgalle (talk) 14:39, 24 August 2010 (UTC)[reply]


Yes, I'm against. Straight-line grammars are of interest outside the Smallest Grammar Problem. They are used in data compression, where the optimal grammar for compression is typically NOT the smallest grammar in the sense of the Smallest Grammar Problem. For example, Conrad and Wilson's GLZA algorithm achieves better compression than any prior SLG-based algorithm in part by avoiding the construction of rules for infrequently-repeating patterns made of frequently-repeating elements. (In that case the short entropy codes used to "spell out" the repeats each time add up to less than the cost of creating rules to encode the repeats as repeats.)

The "smallest-grammar problem" was designed to resemble the actual SLG compression problem, but it has misled many people into thinking that they're the same problem, which they're very much not not. GLZA systematically beats any compressor that constructs an "irreducible grammar" (where the output contains no repeats), but it's not the first to recognize this. An earlier algorithm for DNA sequence compression recognized this and removed rules that clearly reduced compression, after constructing an irreducible SLG. (I believe it's cited in the GLZA paper.) GLZA solves the problem by not creating such rules in the first place.

The GLZA reference (DCC 2016) is at http://ieeexplore.ieee.org/document/7786210/

Full paper can be found at https://encode.ru/threads/2427-GLZA

Name

[edit]

Lohery et al. actually call it a "straight-line context-free tree grammar (SLCF tree grammar)"; italics in original. I'm guessing there might some other source that calls it less verbosely. JMP EAX (talk) 14:11, 17 August 2014 (UTC)[reply]

I found exactly one source in Google Books to call it just straight-line grammar [1]. I don't know yet if that's because the notion isn't common or because other people might call it something else. JMP EAX (talk) 14:17, 17 August 2014 (UTC)[reply]

Google Scholar finds a few more hits, but I note that some of the authors there put "straight line" in scare quotes because the grammar is really acyclic [in non-terminals] not "straight line" [2], although what comes out of it is a single "straight line" string (for which the adjectival phrase is redundant of course). JMP EAX (talk) 14:26, 17 August 2014 (UTC)[reply]

I think the phrase "straight-line program" may be used more commonly for this, as well as in various other contexts, such as Smale's tau conjecture, which make it hard to search for. There are 21 hits for "straight line" grammar on ZMATH, mainly using that term or "grammar-based compression". See for example this. Deltahedron (talk) 17:05, 18 August 2014 (UTC)[reply]

In compression papers (mostly in conference proceedings and journals, not books), it's common to use the term Straight-line Grammar and abbreviate it SLG. It's easier to understand why it's called "straight-line" if you think in terms of programs, though. There's a duality between the directed graph and a program with a procedure for each rule. The recursive graph traverser can be considered an interpreter for a graph-reduction machine, where each rule is a (parameterless) procedure to be interpreted sequentially. (Each item on the right-hand side is a statement to be interpreted, which may say to emit a literal (at the leaves), or to call a parameterless procedure, i.e., recursively interpret the named rule.) On this view, there are procedure calls but no explicit conditionals---no if/then statements---and each procedure body is a "straight-line" sequence of either literals to be printed or similar straight-line procedures to be called. There's implicitly a conditional for deciding whether to emit a literal or recurse to interpret a procedure body, of course, but each procedure body is a strict sequence of such things. (And that conditional would disappear if the "procedure" were compiled rather than interpreted---every time a rule/procedure was executed it would do exactly the same thing. And really that's what's happening with a memoized graph reducer---the first time the rule is used, it's interpreted to produce a literal string. A pointer to that string is remembered and re-used at subsequent decodings of the symbol, which are just string copies. So really the "interpretation" is being compiled on-the-fly to a simple, fast string copy.) This corresponds to the really profoundly impoverished kind of "context-free grammar" we're using, i.e., one with no real abstraction in it. (There are no abstract categories like <Noun> or <VerbPhrase>, so each nonterminal corresponds to just one literal string. This trivially makes decoding fast linear time, but it's a trivial,weak subset of "context-free" grammars. It's so restricted by the "straight-line" condition that it makes "context-free" seem redundant and verbose. It's interesting to consider what slightly more general grammars could be fast to decode, and perhaps support better compression.) — Preceding unsigned comment added by 2602:306:CD5B:FD30:61C0:3B3E:902:22A7 (talk) 02:32, 3 December 2017 (UTC)[reply]