Mike Lindegarde... Online

Things I'm likely to forget.

Getting Started with Visual Studio 2015, Antlr 4, C#, and a SQL Grammar

The Part You Can Skip...

Back when I had a lot of free time (high school / college) I enjoyed studying the art of compiler design.  My first first lexer and parser (LL(1)) were written from scratch.  By college I had discovered lex / yacc and flex / bison.  Unfortunately, I haven't made the time to dabble in this area in the decade or so that has lapsed since leaving school.

Fortunately I recently had a real world problem I was able to over complicate enough to warrant a domain specific language.  Of course I could have solved the problem with a switch statement, but switch statements are evil (I can't back that up with facts).  Instead I took the opportunity to play with Antlr.  Given the current state of the open source community and the abundance of Stack Overflow answers in search results I expected this to be much more straight forward than it was.

Although I'm perfectly content to develop in Java, my use case for the DSL is based in .NET.  Antlr 4 has a C# target, so I didn't expect this to be a problem.  It turns out that the documentation is either sparse or a bit out dated.  Below you'll find what I had to do to get things up and running using Antlr 4 and Visual Studio 2015...

Setting up Visual Studio 2015

You can find the outdated documentation for the C# Antlr 4 target on GitHub.  The documentation is pretty complete and still useful.  The biggest problem you'll run into is that the Antlr language support extension it references has not been updated for Visual Studio 2015.

Step 1: Create a new Class Library project

Generating the C# lexer and parser happens through the magic of NuGet packages.  All you need is a class library project, a valid grammar file (or two), and the appropriate NuGet packages.

Step 2: Add NuGet packages to your class library

Install-Package Antlr4
Install-Package Antlr4.Runtime

As of version 4.3.0, the NuGet package will not work with .NET 4.5.1 or newer.  I'm currently targeting 4.6.  It took me a moment or two to realize this problem.  As long as you're isolating your lexer and parser in it's own project (as you should be) this shouldn't be a problem.  Just edit your project to target the older version of the .NET framework.

Step 3: Add your grammar file(s)

If the Antlr extension for Visual Studio work with 2015 this would be trivial: just follow the documentation.  But... it doesn't (at least not at the time of writing this).

You must make sure the grammar file has the correct encoding.  If you don't, things won't work.  The GitHub documentation walks you through using the advanced save options in Visual Studio to set the proper encoding.  Although I found the advanced save options in Visual Studio 2015, I did not see Unicode (UTF-8 without signature) - Codepage 65001 listed as an option.  I ended up using NotePad++ and setting the encoding to UTF-8-BOM.

I was not able to edit the grammar file properties as outlined in the official documentation.  Instead I resorted to editing the csproj file.  You can do this two ways:

  1. Unload the project in Visual Studio, then open the project file in Visual Studio for editing.  Once your done, reload the project
  2. Use an external editor (like NotePad++, Atom, Sublime, VIM, whatever floats your boat) to edit the csproj file.  When you switch back to Visual Studio you'll be prompted to reload the project.

I had to add the following section to my csproj file

  <ItemGroup>
    <Compile Include="Properties\AssemblyInfo.cs" />
  </ItemGroup>
  <ItemGroup>
    <Antlr4 Include="SomeParser.g4">
      <Generator>MSBuild:Compile</Generator>
      <CustomToolNamespace>RootNamespace.Folder</CustomToolNamespace>
    </Antlr4>
  </ItemGroup>
  <ItemGroup>
    <None Include="packages.config" />
  </ItemGroup>

Make sure you remove the grammar file from any other section it may appear in.  If you have more than one g4 file (your lexer and parser are split) you will need to add an Antlr element for each file.

Finding a SQL Grammar

At this point you should be able to build your class library and generate an assembly that has your lexer, parser, base visitor, and base listener classes in it.  The next hurdle is finding an existing Antlr 4 SQL grammar.

After some digging, I found a SQL grammar file on GitHub (where else?).  Whether or not you can legally use the grammar I've linked you to is something you need to decide.  According to a Google Group discussion I found, Salesforce.com may have released their SQL grammar.  I have not looked for that.  Let me know if you find it.  You can also check the GitHub Antlr 4 grammar repository for grammar files.  As of writing there wasn't a SQL grammar file there.

Random Other Links

Here are few other random links that helped me in some way:

Helpful Beginner Hints

Make sure you order your predicates with the most specific at the top.  If you don't you'll end up in a situation where your intended predicate can't be reached because a more generic one is reached first (i.e. defining a string literal that can have spaces before you define an identifier token that can't have spaces).

Take a look at ANTLRWorks 2.  It's incredibly useful when writing your grammar files.

I strongly encourage you to read a book or two.  Blogs (like this one) are great at getting you started; however, books do a better job providing you with the full picture.

Let me know if you know something I don't.  I'd love to update this article with more useful and / or up to date information.