SimpleParse 2.1

Web Name: SimpleParse 2.1

WebSite: http://simpleparse.sourceforge.net

ID:86402

Keywords:

SimpleParse,

Description:

SimpleParse is a BSD-licensed Python packageproviding a simple and fast parser generator using a modified versionof the mxTextToolstext-tagging engine. SimpleParse allows you to generate parsersdirectly from yourEBNF grammar.Unlike most parser generators, SimpleParse generates single-passparsers (there is no distinct tokenization stage), anapproach taken from the predecessor project (mcf.pars) whichattempted to create "autonomously parsing regex objects". The resultingparsers are not as generalized as those created by,for instance, the Earley algorithm, but they do tend to be useful forthe parsing of computer file formats and the like (as distinct fromnatural language and similar "hard" parsing problems).As of version 2.1.0 the SimpleParse project includes a patched copyof the mxTextTools tagging library with the non-recursive rewrite ofthe core parsing loop. This means that you will need to build theextension module to use SimpleParse, but the effect is to provide auniform parsing platform where all of the features of a giveSimpleParse version are always available.For those interested in working on the project, I'm activelyinterested in welcoming and supporting both new developersand new users. Feel free to contactme.Documentation Scanning with SimpleParse-- describes the process of creating a Parser object with your EBNFgrammar, and using that parser to scan input texts SimpleParse Grammars --reference to the various features of the default SimpleParse EBNFgrammar variant Processing Result Trees-- brief description of the results of the tagging/scanning process andthe features available for processing (and altering) those results Common Problems -- descriptionof a number of common bugs, errors, pitfalls and anti-patterns whenusing the engine. IBMDeveloperWorks Article by Dr. David Mertz -- discusses (and teachesthe use of) SimpleParse 1.0, contrasting the EBNF-based parser withtools such as regexen for text-processing tasks. Watch alsofor Dr. Mertz' upcoming book Text Processing with Python mxTextToolsdocumentation -- documents the underlying mxTextTools engine.Hopefully most users of SimpleParse who aren't actuallycreating custom/prebuilt parsing elements shouldn't need this link. PyDoc references --automatically generated documentation on the various elements withinthe package. Of particular interest are the library ofreusable structures (simpleparse.common)and the Parser class,which is the primary interface for the parsing system.Acquisition and Installation You will need a copy of Python with distutilssupport (Python versions 2.0 and above include this). You'll also needcompiler compatible with your Python build and understood by distutils.To install the base SimpleParse engine, downloadthe latest version in your preferred format. If you are using theWin32 installer, simply run the executable. If you are using one of thesource distributions, unpack the distribution into atemporary directory (maintaining the directory structure)then run: setup.py install in the top directory created by the expansion process. Thiswill cause the patched mxTextTools library to be built as a sub-packageof the simpleparse package and will then install the whole package toyour system.Features/ChangelogNew in 2.1.0a1: Includes (patched) mxTextTools extension as part of SimpleParse,no longer uses stand-alone mxTextTools installations Retooled setup environment to build and distribute directly fromthe CVS checkout Bug-fixes in c_comment and c_nest_comment common productions(thanks to Stephen Waterbury), basic tests for the comment productionsadded Bug fix for error-on-fail SyntaxError's when used with optionalstring message (2.0.1.a3)diff -w -r1.4 error.py32c32 return'%s: %s'%( self.__class__.__name__, self.messageFormat(message) )--- return'%s: %s'%( self.__class__.__name__, self.messageFormat(self.message) ) Case-insensitive literal values declared with c"literal" in thedefault grammar (new in 2.0.1a2) Significant optimisations to the generated parse tables, canresult in huge speedups for very formal grammars New, refactored and simplified API. Most of the time you onlyneed to deal with a single class for all your interactions with theparser system (simpleparse.parser.Parser),and one module if you decide to use the provided post-processingmechanism (simpleparse.dispatchprocessor). "Expanded Productions" -- allow you to define productions whosechildren are reported as if the enclosing production did not exist(allows you to use productions for organisational as well asreporting purposes) Exposure of calloutmechanism in mxTextTools Exposure of "LookAhead" mechanism in mxTextTools (allows you tospell "is followed by", "is not followed by", or "matches x butdoesn't match y" in SimpleParse EBNF grammars). Specified withthe prefix ?, as in ?-"this" which matches iff "this" is not the nextitem, but on matching doesn't move the read-head forward (moreprecisely, it causes the engine to continue processing at the previousposition). "Error on fail" error-reporting facility, allows you to raiseParser Syntax Errors when particular productions (or element tokens)fail to match. Allow for fairly flexible error reporting.To specify, just add a '!' character after the element tokenthat must match, or include it as a stand-alone token in a sequentialgroup to specify that all subsequent items must succeed. You canspecify an error message format by using a string literal after the !character. Library of common constructs (simpleparse.common package)which are easily included in your grammars Hexidecimal escapes for string and character ranges Rewritten generators -- the generator interface has beenseperated from the parser interfaces, this makes it possible towrite grammars directly using generator objects if desired, andallows defining the EBNF grammar using the same tools as generatederived parsers An XML-Parser (including DTD parsing) based on the XMLspecification's EBNF (this is not a production parser, merely anexample for parsing a complex file format, and is not yet Unicodecapable) Example VRML97 and LISP parsers Compatability API for SimpleParse 1.0 applications With the non-recursive mxTextTools, can process (albeitinefficiently) recursion-as-repetition grammars Non-recursive rewrite of mxTextTools now ~95% of the speed of therecursive version General Simple-to-use interface, define an EBNF and start parsing Fast for small files -- this is primarily a feature of theunderlying TextTools engine, rather than a particular feature of theparser generator. Allows pre-built and external parsing functions, which allows youto define Python methods to handle tricky parsing tasks"Class" of Parsers GeneratedOur (current) parsers are top-down, in that they work from the topof the parsing graph (the root production). They are not, however,tokenising parsers, so there is no appropriate LL(x) designation as faras I can see, and there is an arbitrary lookahead mechanism that couldtheoretically parse the entire rest of the file just to see if aparticular character matches). I would hazard a guess that theyare theoretically closest to a deterministic recursive-descent parser.There are no backtracking facilities, so any ambiguity is handled bychoosing the first successful match of a grammar (not the longest, asin most top-down parsers, mostly because without tokenisation, it wouldbe expensive to do checks for each possible match's length). As aresult of this, the parsers are entirely deterministic.The time/memory characteristics are such that, in general, the timeto parse an input text varies with the amount of text to parse. Thereare two major factors, the time to do the actual parsing (which, forsimple deterministic grammars should be close to linear with the lengthof the text, though a pathalogical grammar might have radicallydifferent operating characteristics) and the time to build the resultstree (which depends on the memory architecture of the machine, thecurrently free memory, and the phase of the moon). As a rule,SimpleParse parsers will be faster (for suitably limited grammars) thananything you can code directly in Python. They will not generallyoutperform grammar-specific parsers written in C.Missing Features SimpleParse does not current use an Earley or similar highlygeneralised parser, instead, it uses a simple deterministic parsingalgorithm which, though fast for certain classes ofproblems, is incapable of dealing with ambiguity, backtracking orcross-references The library of common patterns is extremely sparse Unicode support There is no analysis and only minimal reduction done on thegrammar. Having now read most of Parsing Techniques - APractical Guide, I can see how some fairly significant changes willbe required to support such operations (and thereby the more commonparsing techniques). Alternative parsing back-ends -- the new objectgenerator moduleis fairly well isolated from the rest of the system, andencompasses most of the dependencies on the mxTextTools engine. Addingan optional Earley or similar back-end should bepossible with minimal upset to the project. A backend using reobjects is another possibility (my precursor mcf.pars engine waswritten to use regexen for parsing, and was an acceptable (thoughnot stellar) performer). Alternative EBNF grammars -- SimpleParse's EBNF, though fairlyreadily understood, is not by any means the only EBNF variant,providing support for a number of EBNF variants would ease the jobof porting grammars to the system. More common/library code -- common data formats, HTML and/orSGML parsers mxTextTools Rewrite Enhancements Case-insensitive matching commands? Backtracking support?Alternate C Back-end? Given the amount of effort poured into the mxTextTools engine,this may seem silly, but it would be nice to implement a more advancedparsing algorithm directly in C, without going through theassembly-likeinterface of mxTextTools. Given that Marc-Andr isn'tinterestedin adopting the non-recursive codebase, there's not much pointretaining compatability with mxTextTools, so moving to a moreparser-friendly engine might be the best approach.mxBase/mxTextTools InstallationNOTE: This section onlyapplies to SimpleParse versions before 2.1.0, SimpleParse 2.1.0 andabove include a patched version of mxTextTools already!You will want an mxBase2.1.0 distribution to run SimpleParse, preferably with thenon-recursive rewrite. If you want to usethe non-recursive implementation, you will need to get the sourcearchive for mxTextTools. It is possible to use mxBase 2.0.3 withSimpleParse,but not to use it for building the non-recursive TextTools engine(2.0.3 also lacks a lot of features and bug-fixes found in the 2.1.0versions).Note: without the non-recursiverewrite of 2.1.0 (i.e. with the recursive version), the testsuite will not pass all tests.I'm not sure why they fail with the recursive version, but itargue for using the non-recursive rewrite.To build the non-recursive TextTools engine, you'll need toget the source distribution for the non-recursive implementation fromthe SimpleParsefile repository. Note,there are incompatabilities in the mxBase 2.1 versions that make itnecessary to use the versions specified below to build thenon-recursive versions. Python 2.2.x, mxBase2.1b5, non-recursive 1.0.0b4 Python 2.3.x, mxBase2.1 August 2003 Shapshot, non-recursive 1.0.0b5+This archive is intended to be expanded over themxBase source archive from the top-level directory, replacing one fileadding four others.cd egenix-mx-base-2.1.0gunzip non-recursive-1.0.0b1.tar.gztar -xvf non-recursive-1.0.0b1.tar(Or use WinZip on Windows). When you have completed that, run:setup.py build --force install in the top directory of the eGenix-mx-base source tree. Copyright, License DisclaimerThe 2.1.0 and greater releases include the eGenix mxTextToolsextension:Licensed underthe eGenix.com Public License see the mxLicense.htmlfile for details onlicensing terms for the original library, the eGenix extensions are: Copyright (c) 1997-2000, Marc-Andre Lemburg Copyright (c) 2000-2001, eGenix.com Software GmbHExtensions to the eGenix extensions (most significantly the rewriteof the core loop) are copyright Mike Fletcher and released under theSimpleParse License below: Copyright 2003-2006, Mike FletcherSimpleParse License:Copyright 1998-2006, Copyright byMike C. Fletcher; All Rights Reserved.mailto: mcfletch@users.sourceforge.netPermission to use, copy, modify, anddistribute this software andits documentation for any purpose and without fee or royalty ishereby granted, provided that the above copyright noticeappear in all copies and that both the copyright notice andthis permission notice appear in supporting documentation orportions thereof, including modifications, that you make. THE AUTHOR MIKE C. FLETCHER DISCLAIMS ALLWARRANTIES WITH REGARD TOTHIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE AUTHORBE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGESOR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHERTORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE ORPERFORMANCE OF THIS SOFTWARE!A Open Source project

TAGS:SimpleParse 

<<< Thank you for your visit >>>

Websites to related :
Gammers Gammers

  Gammers of Stowmarket WELCOME TO VIEW STOCK MOT & SERVICING At Gammers, we stock a range of used cars and vans at reasonable prices. If you can’

Myrtle Beach Condo Rentals | Oce

  Great locations, world-class accommodations, outstanding amenities, and responsive service are just a few of things you can expect when you book your

Threat Stack Blog | Threat Stack

  Cyber Attack Simulation Watch a sophisticated cloud attack and learn the necessary steps to prepare yourself. Watch Now Please Enable Javascript Plea

Ruskin Authority in Air Contro

  SMACNA EDGE Conference - Ruskin will be virtually exhibiting - October 13-15 Don't forget to register! - The SMACNA EDGE Conference where Ruskin will

DetroitYES Forums

  SINCE 1997Where all things great and small regarding the Fabulous Detroit-Windsor International Metropolis are Discussed. Registration is free and req

Cleveland Lumber | The Builders

  One of the most respected names in the building materials industry since 1946. AFCO manufactures Columns, Railing and Exterior Door Components. Trust

Cedar Lumber and Siding Delivery

  360-377-9943 Cedar Products Company has been serving the residents of Kitsap, Pierce, Jefferson, and Mason Counties for more than 28 years!We speciali

Allied Insurance Brokers: Insura

  How Can We Help You Be Solutions Driven? Crane Rigging, Heavy Haul Scaffold Access Rental Equipment Dealers Non-profit,Health Human Services

VELUX Skylights | See our sele

  VELUX Sun Tunnel Skylightbring natural light into homes through a specially designed tunnel that passes from roof to ceiling and are a simple and cost

Blue Ridge Lumber Company Home

  Manufacturing and selling quality Appalachian hardwoods and offering honest value to a global market since 1981. Blue Ridge Lumber Co., LLC provides

ads

Hot Websites