t43562 2 days ago

What I always wanted from Clang and was not smart enough to understand was how to use it to get an AST or similar so that I could rewrite code.

I was working on a big project with old code that needed a radical overhaul to make it ready for new boost and new C++ versions. It seemed a huge task to do this accurately by hand. I also thought about combining this with some static analysis of a trivial kind - looking for behaviours that are not wrong in C++ overall but were wrong in the context of that project.

I know there are several tools for clang that do rewrites and the unfortunate problem was that clang at the time wasn't able to build that code for some reasons that I cannot remember now. The rewriting tools that were available were not very sophisticated from my memory and made it difficult to do the kinds of changes that I thought would be needed.

So I'm not so interested in being able to compile but being able to parse and then walk the parse tree in some very easy language like python.

I think the answer now might be "libclang": https://pypi.org/project/libclang/

  • menaerus 2 days ago

    libclang has been with us for ~15 years already.

    > I'm not so interested in being able to compile but being able to parse

    Chances are that you're not going to be able to walk the AST without getting errors unless the (lib)clang can cleanly parse the code. There are at least two prerequisites for that: (1) marrying your build system with the (lib)clang so that you pass the exact include directories, include paths, build flags, etc. for any given translation unit, and (2) ability to build the code with clang. If you use CMake then it's going to be a little bit easier. If not, I recommend first switching to it. You will also have to be somewhat creative about making it possible to correctly parse the code in the header files - a header file fundamentally isn't a translation-unit.

    Back in the days, I wrote some of the utilities built on top of the libclang and the process was very hairy. Although I pulled it off, for the reasons above it required a lot of unsexy work exactly around the integration of (CMake) build system, patching the codebase itself, and working around the design limitations imposed by the language itself. Occasionally, libclang wouldn't offer me enough control over AST so I also had to work around those limitations by patching the libclang code itself. libtooling seemingly gives more control but I have never tried it.

    If you still want to try it, I think your best bet is to find something similar that already exists and build upon it.

  • dataflow 2 days ago

    If all you want is pure syntax with no semantics, check out tree sitter. You can get it past macros by pre-processing via alternative implementations in some cases.

    But most likely you want some semantics. The "one weird trick" you can pull here is that if all you want is syntactic manipulation, you mostly don't have to care about semantics and codegen, or anything past that (like linking or embedding resources) -- and you don't really have to care about most of the compiler flags, either. That takes out a fair bit of complexity. Moreover you can do the migration to Clang file-by-file, and you can do that with #ifndef __clang__ wherever you're certain the code is unaffected by your migration (which you can detect via compiling the file).

    And on top of that you can use the most relaxed warnings possible in Clang - think MSVC compatibility flags and disabling all warnings and such.

    I'm not suggesting it's trivial, but I've done it before, and it was not as daunting as I first imagined it. If you haven't tried these already, I would definitely give it a shot for a few files, and see how long it takes you on average per file.

  • inetknght 2 days ago

    > I was working on a big project with old code that needed a radical overhaul to make it ready for new boost and new C++ versions.

    You might find Kristen Shaker's CppCon 2023 talk to be intriguing.

    https://www.youtube.com/watch?v=torqlZnu9Ag

    > I know there are several tools for clang that do rewrites and the unfortunate problem was that clang at the time wasn't able to build that code for some reasons that I cannot remember now.

    Oh. Well that's unfortunate. You're probably going to have a bad time if no version of clang is able to compile (eg, libclang probably won't help). But GCC has something similar:

    https://stackoverflow.com/questions/15800230/how-can-i-dump-...

    Alas, I'm not familiar with either tool. I would find it interesting to read a long-form blog post from you with more information and what you end up doing!

  • yaantc 2 days ago

    castxml (https://github.com/CastXML/CastXML) may be what you want. It uses the Clang front-end to output an XML representation of a C or C++ parse tree. It is then possible to turn this into what you want. I've used it and seen it used to generate code to do endianess conversion of structures from headers, or RPC code generation for example.

    It can be used from Python through pygccxml (https://github.com/CastXML/pygccxml). The name comes from a previous instance, gccxml, based on the GCC front-end.

    Both castxml and pygccxml are packaged in Debian and Ubuntu.

esbranson 2 days ago
  • HdS84 2 days ago

    MS best idea was to create Source Generators. Absurdly powerful for some problems.

    • pjmlp 2 days ago

      They still miss something like good old T4 templates though.

      • HdS84 8 hours ago

        Yes, that is missing. I thought about creating something like that, but it's a lot of work. Also, the huge number of source generators have a problem: a SG cannot see code generated by a another sg. There is no ability to stack them. That will be a problem as more and more sg's generate source. E.g. I have a SG which creates interfaces. No other sg can use these interfaces.

fooblaster 3 days ago

This is pretty cool. What are the use cases?

  • secondcoming 2 days ago

    I’ve wanted something like this to compile XML business rules to native code on the fly.

    I would have a script that converted the XML to C and then turn that into machine code that I’d load directly from the .text section of the binary (shellcode style).

    Turns out this is error prone because compilers can emit things like jump tables into the .rodata section, so you need that too. It’s easier to just create a shared library.

    • high_na_euv 2 days ago

      XML business rules? Is it xml-oriented-programming again? They were doing it in Java 25 years ago

  • wdpk 3 days ago

    compile c++ code at runtime for instance. Lots of use cases, most obvious one specializing/instantiating dense computational kernels on values only known at runtime... but so many more things would be possible if the compiler was just a reusable library.

    • ska 2 days ago

      This is something Common Lisp got really right.

      • wdpk 2 days ago

        well being homoiconic and dynamic helps quite a bit... This being said, if you squint a bit and get used to the syntax, c++ variadic templates are just a compile-time lisp (really templates are just generalized functions over types) and the template mechanism is 100% pure, with a runtime capability of evaluating those pure monadic computational effects defined at compile-time to runtime, there is no more boundary (not saying it's a thing that should be done all the time). The main advantage then over functional languages is the fact that c++ optimizing compilers are already pretty good at optimization so assuming that you can afford to re-compile at runtime the tight inner loops or critical paths (say at "configuration time" when adding some latency might not be a big deal), a lot of otherwise impossible optimizations could probably be done better (thinking of loop invariants, polyhedral, unrolling, constant propagation, aliasing, row major to column major etc etc) probably the result would also be better than what a JIT compiler and profiler would be able to achieve too.

  • mshockwave 2 days ago

    I think this is exactly how Zig compiler does under the hood for C/C++ sources. So I guess you can do a similar thing for your own programming languages to support interop.

  • menaerus 2 days ago

    REPL and JIT come to mind.

tway223 2 days ago

I have been wondering if someone could improve golang's cgo infra using clang like what zig is doing..

izaro 2 days ago

[flagged]