refactoring - Is there command-line tool to extract typedef, structure, enumeration, variable, function from a C or C++ file? -


i desiring command-line tool extract definition or declaration (typedef, structure, enumeration, variable, or function) c or c++ source file. way replace existing definition/declaration handy (after transforming extracted definition user-submitted script). there such generic tool available, or resonably close approximation of such tool?

scriptability , ability hook-up user created scripts or programs of importance here, although academically curious of gui programs too. open source solutions unix/linux camp preferred (although curious of windows , os x tools too). primary language interests c , c++ more generic solution better (i think not need super accurate parsing capabilities finding, extracting , replacing definition in program source file).

sample use cases (extra - curious mind):

  1. given nested structs , variable (array) initializations of these types, suppose there need change struct definition adding or reordering fields or rewriting variable/array definitions in more readable format without introducing errors resulting manual labor. work extracting old initializations, using script/program write new initializations replace old ones.
  2. for implementing code browsing tool - extract definition.
  3. decorative code generation (e.g. logging function entries / returns).
  4. scripted code structuring (e.g. extract , thing , put in different place without change - version control commit comment document command perform operation make evident , verifiable nothing changed).

alternative problem: if there tool tell location of definition (beginning , end line suffice - assume definitions/declarations interested in in own line), exercise of finger dexterity to write program to

  1. extract definitions,
  2. replace definitions, or
  3. extract definition, run program specified command line options (or editor)

    • receive desired extracted definitions stdin (or temporary file),
    • perform transformation (editing), ,
    • output new definitions stdout (or save them given temporary file)

    to replaced executing program.

so major, more challenging problem finding begin , end line of definition.

note tags: more accurate tag code-generation code-transformation not exist.

our dms software reengineering toolkit trying tool wishing for. pushing state of art , isn't nirvana style tool. enough real, interesting work.

dms provides general facilities parsing, analyzing , transforming source code.

it uses explicit grammars define languages (such c , c++); grammars drive parsers build abstract syntax trees (asts). variety of analysis primitives provide a) facilities ["attribute grammars" atgs] collecting information along tree-like information flow paths match shape of asts nicely, b) construction of symbol use symbol definition maps ["symbol tables"], c) control , data flow analysis using facts extracted atgs, d) range analysis, e) points-to analysis both local , global. these primitive analyzers can used compose facts ast draw conclusions code represented asts (e.g., "this statement modifies these variables"). langauge front end packages grammar , language-specific analyzers in reusable bundle. dms has such language front ends of varying levels of depth , maturity wide variety of languages.

[edit 6/27: c , c++ front ends have support specific dialects of c , c++: ansic, c99, gcc3/4 c, ms visual c, ansi c++98, ansi c++11, gcc3/4 c++, ms visual c++ 2005/2008/2010. if want accurate analysis of code, should use "right" dialect process code.]

but "analysis" isn't point. purpose of analysis drive change. dms provides additional support procedurally modify asts, modify asts source-to-source rewrite rules written in surface syntax of language (both conditioned chosen analysis result), or group sets of procedural , source-to-source rewrites make compound, complex rewrites can carry off massive code changes such re-architecting, etc. after asts transformed, can used regenerate ("prettyprint") syntactically correct code in corresponding front-end language/dialect. [by modifying ast 1 language piecewise until have ast another, can build translators, isn't easy sentence implies].

this works considerable degree, yet still stymied language complications. c , c++, famous complication preprocessor; editing program text arbitrarily, preprocessor conditionals can render source code unparseable resembling standard parsing technology. dms's c , c++ front ends ameliorate , can parse code well-structured preprocessor directives including strange cases people not call structured commonly occur:

   #if  cond         if (abc)  {    #else         if (def)  {    #endif 

we making interesting progress on parsing code arbitrary placement of preprocessor conditionals. once that, of analyzers have take preprocessor conditionals account , we're on turf compiler people have not visited.

dms has been used make major architectural shifts in large c++ programs, converting non-corba style corba style immense amount of code shuffling, extract code along arbitrary control flow paths generate sow-style apis existing c code, insert instrumentation in large c programs detect pointer errors, etc. [it has been applied other tasks in many of other languages].

in our own experience, still pretty hard use. in our opinion, in same sense democracy worst of systems of government except rest; ymmv. website has lots of dms-derived tools , discussions.

it has in fact been used extract functions (the sow-exercise more general that) , insert functions (this generalized case of instrumentation).

tools gcc-xml shadows of dms's capabilities. gcc-xml parses, builds symbol tables, , dumps data declarations (not code), can't make code changes. clang better; parses c , c++ asts, can analyses on llvm intermediate representation, , has kind of mechanism spitting out to-be-applied-later patches source text inspired desired tree change. don't know if clang can carry out massive code transformations, 1 transformation's result transformed again (how modify tree delayed text patch?). dms can day long, , can many languages other c , c++, , can arbitrary mixture of langauges knows.

until preprocessor problem conditionals gets solved, analyzing/transforming c , c++ code not easy. succeed in these tasks on these languages sheer willpower , using the strongest tools can build. (java doesn't have these problems, , dms correspondingly better @ analyzing/transforming it).

at severe risk of hubris, believe dms best of tools out there general purpose analysis , transformation. architect, view long term job make ever stronger task.


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -