Introduced changes¶
- Maintainer
Masatake YAMATO <yamato@redhat.com>
Table of contents
Many changes have been introduced in Universal-ctags. Use git-log to review changes not enumerated here, especially in language parsers.
New and extended options¶
Wildcard in options¶
For the purpose of gathering as much as information as possible from
source code the “wildcard”(*
) option value has been introduced.
--extras=*
Enables all extra tags.
--fields=*
Enables all available fields.
--kinds-<LANG>=*
Enables all available kinds for
LANG
.
--kinds-all=*
Enables all available kinds for all available language parsers.
Long names in kinds, fields, and extra options¶
A letter is used for specifying a kind, a field, or an extra entry. In Universal-ctags a name can also be used.
Surround the name with braces ({ and }) in values assigned to the
options, --kind-<LANG>=
, --fields=
, or --extras=
.
$ ./ctags --kinds-C=+L-d ...
This command line uses the letters, L for enabling the label kind and d for disabling the macro kind of C. The command line can be rewritten with the associated names.
$ ./ctags --kinds-C='+{label}-{macro}' ...
The quotes are needed because braces are interpreted as meta characters by the shell.
The available names can be listed with --list-kinds-full
,
--list-fields
, or --list-extras
.
Notice messages and --quiet
¶
There were 3 classes of message in ctags:
fatal
A critical error has occurred and ctags aborts the execution.
warning
An error has occurred but ctags continues the execution.
verbose
Mainly used for debugging purposes.
notice is a new class of message. It is less important than warning but more important for users than verbose.
Generally the user can ignore notice class messages and --quiet
can be used to disable them.
--input-encoding=ENCODING
and --output-encoding=ENCODING
¶
People may use their own native language in source code comments (or sometimes in identifiers) and in such cases encoding may become an issue. Nowadays UTF-8 is the most widely used encoding, but some source codes still use legacy encodings like latin1, cp932 and so on. These options are useful for such files.
ctags doesn’t consider the input encoding; it just reads input as a sequence of bytes and uses them as is when writing tags entries.
On the other hand Vim does consider input encoding. When loading a file, Vim converts the file contents into an internal format with one of the encodings specified in its fileencodings option.
As a result of this difference, Vim cannot always move the cursor to the definition of a tag as users expect when attempting to match the patterns in a tags file.
The good news is that there is a way to notify Vim of the encoding
used in a tags file with the TAG_FILE_ENCODING
pseudo tag.
Two new options have been introduced (--input-encoding=IN
and
--output-encoding=OUT
).
Using the encoding specified with these options ctags converts input
from IN
to OUT
. ctags uses the converted strings when writing
the pattern parts of each tag line. As a result the tags output is
encoded in OUT
encoding.
In addition OUT
is specified at the top the tags file as the
value for the TAG_FILE_ENCODING
pseudo tag. The default value of
OUT
is UTF-8.
NOTE: Converted input is NOT passed to language parsers. The parsers still deal with input as a byte sequence.
With --input-encoding-<LANG>=IN
, you can specify a specific input
encoding for LANG
. It overrides the global default value given
with --input-encoding
.
The example usage can be found in Tmain/{input,output}-encoding-option.d.
Acceptable IN
and OUT
values can be listed with iconv -l or
iconv –list. It is platform dependant.
To enable the option, libiconv is needed on your platform.
On Windows mingw (without msys2), you must specify WITH_ICONV=yes
like this:
C:\dev\ctags>mingw32-make -f mk_mingw.mak WITH_ICONV=yes
--list-features
helps you to know whether your ctags executable
links to libiconv or not. You will find iconv
in the output if it
links to.
Extra tag entries (--extras
)¶
--extra
option in Exuberant-ctags is renamed to --extras
(plural) in
Universal-ctags for making consistent with --kinds-<LANG>
and --fields
.
These extra tag entries are newly introduced.
F
Equivalent to –file-scope.
p
Include pseudo tags.
Kinds synchronization¶
In Universal-ctags, as in Exuberant-ctags, most kinds are parser local; enabling (or disabling) a kind in a parser has no effect on kinds in any other parsers even those with the same name and/or letter.
However, there are exceptions, such as C and C++ for example. C++ can be considered a language extended from C. Therefore it is natural that all kinds defined in the C parser are also defined in the C++ parser. Enabling a kind in the C parser also enables a kind having the same name in the C++ parser, and vice versa.
A kind group is a group of kinds satisfying the following conditions:
Having the same name and letter, and
Being synchronized with each other
A master parser manages the synchronization of a kind group. The
MASTER column of --list-kinds-full
shows the master parser of
the kind.
Internally, a state change (enabled or disabled with
--kind-<LANG>=[+|-]...
) of a kind in a kind group is reported to
its master parser as an event. Then the master parser updates the
state of all kinds in the kind group as specified with the option.
$ ./ctags --list-kinds-full=C++
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro on FALSE 1 C macro definitions
...
$ ./ctags --list-kinds-full=C
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro on FALSE 1 C macro definitions
...
The example output indicates that the d kinds of both the C++ and C parsers are in the same group and that the C parser manages the group.
$ ./ctags --kinds-C++=-d --list-kinds-full=C | head -2
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro off FALSE 1 C macro definitions
$ ./ctags --kinds-C=-d --list-kinds-full=C | head -2
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro off FALSE 1 C macro definitions
$ ./ctags --kinds-C++=-d --list-kinds-full=C++ | head -2
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro off FALSE 1 C macro definitions
$ ./ctags --kinds-C=-d --list-kinds-full=C++ | head -2
#LETTER NAME ENABLED REFONLY NROLES MASTER DESCRIPTION
d macro off FALSE 1 C macro definitions
In the above example, the d kind is disabled via C or C++. Disabling a d kind via one language disables the d kind for the other parser, too.
--put-field-prefix
options¶
Some fields are newly introduced in Universal-ctags and more will be introduced in the future. Other tags generators may also introduce their own fields.
In such a situation there is a concern about conflicting field names; mixing tags files generated by multiple tags generators including Universal-ctags is difficult.
--put-field-prefix
provides a workaround for this use case. When
--put-field-prefix
is given, ctags adds “UCTAGS” as a prefix to
newly introduced fields.
$ cat /tmp/foo.h
#include <stdio.h>
$ ./ctags -o - --extras=+r --fields=+r /tmp/foo.h
stdio.h /tmp/foo.h /^#include <stdio.h>/;" h roles:system
$ ./ctags --put-field-prefix -o - --extras=+r --fields=+r /tmp/foo.h
stdio.h /tmp/foo.h /^#include <stdio.h>/;" h UCTAGSroles:system
In this example, roles
is prefixed.
--maxdepth
option¶
--maxdepth
limits the depth of directory recursion enabled with
the -R
option.
--map-<LANG>
option¶
--map-<LANG>
is newly introduced to control the file name
to language mappings (langmap) with finer granularity than
--langmap
allows.
A langmap entry is defined as a pair; the name of the language and a file name extension (or pattern).
Here we use “spec” as a generic term representing both file name extensions and patterns.
--langmap
maps specs to languages exclusively:
$ ./ctags --langdef=FOO --langmap=FOO:+.ABC \
--langdef=BAR --langmap=BAR:+.ABC \
--list-maps | grep '\*.ABC$'
BAR *.ABC
Though language FOO is added before BAR, only BAR is set as a handler for the spec *.ABC.
Universal-ctags enables multiple parsers to be configured for a spec. The appropriate parser for a given input file can then be chosen by a variety of internal guessing strategies (see “Choosing a proper parser in ctags”).
Let’s see how specs can be mapped non-exclusively with
--map-<LANG>
:
% ./ctags --langdef=FOO --map-FOO=+.ABC \
--langdef=BAR --map-BAR=+.ABC \
--list-maps | grep '\*.ABC$'
FOO *.ABC
BAR *.ABC
Both FOO and BAR are registered as handlers for the spec *.ABC.
--map-<LANG>
can also be used for removing a langmap entry.:
$ ./ctags --langdef=FOO --map-FOO=+.ABC \
--langdef=BAR --map-BAR=+.ABC \
--map-FOO=-.ABC --list-maps | grep '\*.ABC$'
BAR *.ABC
$ ./ctags --langdef=FOO --map-FOO=+.ABC \
--langdef=BAR --map-BAR=+.ABC \
--map-BAR=-.ABC --list-maps | grep '\*.ABC$'
FOO *.ABC
$./ctags --langdef=FOO --map-FOO=+.ABC \
--langdef=BAR --map-BAR=+.ABC \
--map-BAR=-.ABC --map-FOO=-.ABC --list-maps | grep '\*.ABC$'
(NOTHING)
--langmap
provides a way to manipulate the langmap in a
spec-centric manner and --map-<LANG>
provides a way to manipulate
the langmap in a parser-centric manner.
Guessing parser from file contents (-G
option)¶
See “Choosing a proper parser in ctags” section.
JSON output¶
Experimental JSON output has been added. --output-format
can be
used to enable it.
$ ./ctags --output-format=json --fields=-s /tmp/foo.py
{"_type": "tag", "name": "Foo", "path": "/tmp/foo.py", "pattern": "/^class Foo:$/", "kind": "class"}
{"_type": "tag", "name": "doIt", "path": "/tmp/foo.py", "pattern": "/^ def doIt():$/", "kind": "member"}
See JSON output for more details.
“always” and “never” as an argument for –tag-relative¶
Even if “yes” is specified as an option argument for –tag-relative, absolute paths are used in tags output if an input is given as an absolute path. This behavior is expected in exuberant-ctags as written in its man-page.
In addition to “yes” and “no”, universal-ctags takes “never” and “always”.
If “never” is given, absolute paths are used in tags output regardless of the path representation for input file(s). If “always” is given, relative paths are used always.
Defining a macro in CPreProcessor input¶
Newly introduced -D
option extends the function provided by
-I
option.
-D
emulates the behaviour of the corresponding gcc option:
it defines a C preprocessor macro. All types of macros are supported,
including the ones with parameters and variable arguments.
Stringification, token pasting and recursive macro expansion are also supported.
-I
is now simply a backward-compatible syntax to define a
macro with no replacement.
Some examples follow.
$ ctags ... -D IGNORE_THIS ...
With this commandline the following C/C++ input
int IGNORE_THIS a;
will be processed as if it was
int a;
Defining a macro with parameters uses the following syntax:
$ ctags ... -D "foreach(arg)=for(arg;;)" ...
This example defines for(arg;;) as the replacement foreach(arg). So the following C/C++ input
foreach(char * p,pointers)
{
}
is processed in new C/C++ parser as:
for(char * p;;)
{
}
and the p local variable can be extracted.
The previous commandline includes quotes since the macros generally contain characters that are treated specially by the shells. You may need some escaping.
Token pasting is performed by the ## operator, just like in the normal C preprocessor.
$ ctags ... -D "DECLARE_FUNCTION(prefix)=int prefix ## Call();"
So the following code
DECLARE_FUNCTION(a)
DECLARE_FUNCTION(b)
will be processed as
int aCall();
int bCall();
Macros with variable arguments use the gcc __VA_ARGS__ syntax.
$ ctags ... -D "DECLARE_FUNCTION(name,...)=int name(__VA_ARGS__);"
So the following code
DECLARE_FUNCTION(x,int a,int b)
will be processed as
int x(int a,int b);
--_interactive
Mode¶
A new --_interactive
option launches a JSON based command REPL which
can be used to control ctags generation programmatically.
See –_interactive Mode for more details.
--_interactive=sandbox
adds up seccomp filter. See
sandbox submode for more details.
Defining a kind¶
A new --kinddef-<LANG>=letter,name,description
option reduces the
typing defining a regex pattern with --regex-<LANG>=
, and keeps
the consistency of dynamically defined kinds in a language.
A kind letter defined with --kinddef-<LANG>
can be referred in
--kinddef-<LANG>
.
Previously you had to write in your optlib:
--regex-elm=/^([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f,function,Functions/{scope=set}
--regex-elm=/^[[:blank:]]+([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f,function,Functions/{scope=ref}
With new --kinddef-<LANG>
you can write the same things like:
--kinddef-elm=f,function,Functions
--regex-elm=/^([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f/{scope=set}
--regex-elm=/^[[:blank:]]+([[:lower:]_][[:alnum:]_]*)[^=]*=$/\1/f/{scope=ref}
We can say now “kind” is a first class object in Universal-ctags.
Defining an extra¶
A new --_extradef-<LANG>=name,description
option allows you to
defining a parser own extra which turning on and off can be
referred from a regex based parser for <LANG>
.
See Conditional tagging with extras for more details.
Defining a subparser¶
Basic¶
About the concept of subparser, see Tagging definitions of higher(upper) level language (sub/base).
With base
long flag of –langdef=<LANG> option, you can define
a subparser for a specified base parser. Combining with --kinddef-<LANG>
and --regex-<KIND>
options, you can extend an existing parser
without risk of kind confliction.
Let’s see an example.
input.c
static int set_one_prio(struct task_struct *p, int niceval, int error)
{
}
SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
{
...;
}
$./ctags --options=NONE -x --_xformat="%20N %10K %10l" -o - input.c
ctags: Notice: No options will be read from files or environment
set_one_prio function C
SYSCALL_DEFINE3 function C
C parser doesn’t understand that SYSCALL_DEFINE3 is a macro for defining an entry point for a system.
Let’s define linux subparser which using C parser as a base parser:
$ cat linux.ctags
--langdef=linux{base=C}
--kinddef-linux=s,syscall,system calls
--regex-linux=/SYSCALL_DEFINE[0-9]\(([^, )]+)[\),]*/\1/s/
The output is change as follows with linux parser:
$ ./ctags --options=NONE --options=./linux.ctags -x --_xformat="%20N %10K %10l" -o - input.c
ctags: Notice: No options will be read from files or environment
setpriority syscall linux
set_one_prio function C
SYSCALL_DEFINE3 function C
setpriority is recognized as a syscall of linux.
Using only –regex-C=… you can capture setpriority. However, there were concerns about kind confliction; when introducing a new kind with –regex-C=…, you cannot use a letter and name already used in C parser and –regex-C=… options specified in the other places.
You can use a newly defined subparser as a new namespace of kinds. In addition you can enable/disable with the subparser usable –languages=[+|-] option:
Directions¶
As explained in Tagging definitions of higher(upper) level language (sub/base), you can choose direction(s) how a base parser and a guest parser work together with long flags putting after –langdef=Foo{base=Bar}.
C level notation |
Command line long flag |
---|---|
SUBPARSER_BASE_RUNS_SUB |
shared |
SUBPARSER_SUB_RUNS_BASE |
dedicated |
SUBPARSER_BASE_RUNS_SUB |
bidirectional |
Let’s see actual difference of behaviors.
The examples are taken from #1409 submitted by @sgraham on github Universal-ctags repository.
input.cc and input.mojom are input files, and have the same contents:
ABC();
int main(void)
{
}
C++ parser can capture main as a function. Mojom subparser defined in the later runs on C++ parser and is for capturing ABC.
dedicated combination¶
{dedicated} is specified, for input.cc, only tags capture by C++ parser are recorded to tags file. For input.mojom, both tags capture by C++ parser and mojom parser are recorded to tags file.
mojom-dedicated.ctags:
tags for input.cc:
main input.cc /^int main(void)$/;" f language:C++ typeref:typename:int
tags for input.mojom:
ABC input.mojom /^ ABC();$/;" f language:mojom
main input.mojom /^int main(void)$/;" f language:C++ typeref:typename:int
Mojom parser works only when .mojom file is given as input.
bidirectional combination¶
{bidirectional} is specified, both tags capture by C++ parser and mojom parser are recorded to tags file for either input input.cc and input.mojom.
mojom-bidirectional.ctags:
tags for input.cc:
ABC input.cc /^ ABC();$/;" f language:mojom
main input.cc /^int main(void)$/;" f language:C++ typeref:typename:int
tags for input.mojom:
ABC input.cc /^ ABC();$/;" f language:mojom
main input.cc /^int main(void)$/;" f language:C++ typeref:typename:int
Listing subparsers¶
Subparsers can be listed with --list-subparser
:
$ ./ctags --options=NONE --options=./linux.ctags --list-subparsers=C
ctags: Notice: No options will be read from files or environment
#NAME BASEPARSER DIRECTION
linux C base => sub {shared}
Including line number to pattern field¶
--excmd=type
specifies how ctags prints pattern field in a tags file.
Universal-ctags introduces combine
as a new type
.
If combine
is given, Universal-ctags combines adjusted line number
and pattern with a semicolon as pattern. ctags adjusts the line number
by decrementing or incrementing (if -B
option is given) one. This
adjustment helps a client tool like vim to search the pattern from the
line before (or after) the pattern starts.
Let’s see an example.
$ cat -n /tmp/foo.cc
1 int foo(int i)
2 {
3 return i;
4 }
5
6 int foo(int i, int j)
7 {
8 return i + j;
9 }
$ ./ctags --excmd=combine -o - /tmp/foo.cc
foo /tmp/foo.cc 0;/^int foo(int i)$/;" f typeref:typename:int
foo /tmp/foo.cc 5;/^int foo(int i, int j)$/;" f typeref:typename:int
Automatic parser selection¶
See “Choosing a proper parser in ctags” section.
Incompatible changes to file name pattern and extension handling¶
When guessing a proper parser for a given input file, Exuberant-ctags tests file name patterns AFTER file extensions (e-order). Universal-ctags does this differently; it tests file name patterns BEFORE file extensions (u-order).
This incompatible change is introduced to deal with the following situation: “build.xml” is an input file. The Ant parser declares it handles a file name pattern “build.xml” and another parser, Foo, declares it handles a file extension “xml”.
Which parser should be used for parsing the input? The user may want to use the Ant parser because the pattern it declares is more specific than the extension Foo declares. However, in e-order, the other parser, Foo, is chosen.
So Universal-ctags uses the u-order even though it introduces an incompatibility.
Parser own fields¶
A tag has a name, an input file name, and a pattern as basic information. Some fields like language:, signature:, etc are attached to the tag as optional information.
In Exuberant-ctags, fields are common to all languages. Universal-ctags extends the concept of fields; a parser can define its own field. This extension was proposed by @pragmaware in #857.
For implementing the parser own fields, the options for listing and enabling/disabling fields are also extended.
In the output of --list-fields
, the owner of the field is printed
in the LANGUAGE column:
$ ./ctags --list-fields
#LETTER NAME ENABLED LANGUAGE XFMT DESCRIPTION
...
- end off C TRUE end lines of various constructs
- properties off C TRUE properties (static, inline, mutable,...)
- end off C++ TRUE end lines of various constructs
- template off C++ TRUE template parameters
- captures off C++ TRUE lambda capture list
- properties off C++ TRUE properties (static, virtual, inline, mutable,...)
- sectionMarker off reStructuredText TRUE character used for declaring section
- version off Maven2 TRUE version of artifact
e.g. reStructuredText is the owner of the sectionMarker field and both C and C++ own the end field.
--list-fields
takes one optional argument, LANGUAGE. If it is
given, --list-fields
prints only the fields for that parser:
$ ./ctags --list-fields=Maven2
#LETTER NAME ENABLED LANGUAGE XFMT DESCRIPTION
- version off Maven2 TRUE version of artifact
A parser own field only has a long name, no letter. For
enabling/disabling such fields, the name must be passed to
--fields-<LANG>
.
e.g. for enabling the sectionMarker field owned by the reStructuredText parser, use the following command line:
$ ./ctags --fields-reStructuredText=+{sectionMarker} ...
The wild card notation can be used for enabling/disabling parser own fields, too. The following example enables all fields owned by the C++ parser.
$ ./ctags --fields-C++='*' ...
* can also be used for specifying languages.
The next example is for enabling end fields for all languages which have such a field.
$ ./ctags --fields-'*'=+'{end}' ...
...
In this case, using wild card notation to specify the language, not only fields owned by parsers but also common fields having the name specified (end in this example) are enabled/disabled.
Using the wild card notation to specify the language is helpful to avoid incompatibilities between versions of Universal-ctags itself (SELF INCOMPATIBLY).
In Universal-ctags development, a parser developer may add a new
parser own field for a certain language. Sometimes other developers
then recognize it is meaningful not only for the original language
but also other languages. In this case the field may be promoted to a
common field. Such a promotion will break the command line
compatibility for --fields-<LANG>
usage. The wild card for
<LANG> will help in avoiding this unwanted effect of the promotion.
With respect to the tags file format, nothing is changed when introducing parser own fields; <fieldname>:<value> is used as before and the name of field owner is never prefixed. The language: field of the tag identifies the owner.
Parser own extras¶
As man page of Exuberant-ctags says, --extras
option specifies
whether to include extra tag entries for certain kinds of information.
This option is available in Universal-ctags, too.
In Universal-ctags it is extended; a parser can define its own
extra flags. They can be controlled with --extras-<LANG>=[+|-]{...}
.
See some examples:
$ ./ctags --list-extras
#LETTER NAME ENABLED LANGUAGE DESCRIPTION
F fileScope TRUE NONE Include tags ...
f inputFile FALSE NONE Include an entry ...
p pseudo FALSE NONE Include pseudo tags
q qualified FALSE NONE Include an extra ...
r reference FALSE NONE Include reference tags
g guest FALSE NONE Include tags ...
- whitespaceSwapped TRUE Robot Include tags swapping ...
See the LANGUAGE column. NONE means the extra flags are language independent (common). They can be enabled or disabled with –extras= as before.
Look at whitespaceSwapped. Its language is Robot. This flag is enabled by default but can be disabled with –extras-Robot=-{whitespaceSwapped}.
$ cat input.robot
*** Keywords ***
it's ok to be correct
Python_keyword_2
$ ./ctags -o - input.robot
it's ok to be correct input.robot /^it's ok to be correct$/;" k
it's_ok_to_be_correct input.robot /^it's ok to be correct$/;" k
$ ./ctags -o - --extras-Robot=-'{whitespaceSwapped}' input.robot
it's ok to be correct input.robot /^it's ok to be correct$/;" k
When disabled the name it’s_ok_to_be_correct is not included in the tags output. In other words, the name it’s_ok_to_be_correct is derived from the name it’s ok to be correct when the extra flag is enabled.
Discussion¶
(This subsection should move to somewhere for developers.)
The question is what are extra tag entries. As far as I know none has answered explicitly. I have two ideas in Universal-ctags. I write “ideas”, not “definitions” here because existing parsers don’t follow the ideas. They are kept as is in variety reasons but the ideas may be good guide for people who wants to write a new parser or extend an exiting parser.
The first idea is that a tag entry whose name is appeared in the input
file as is, the entry is NOT an extra. (If you want to control the
inclusion of such entries, the classical --kind-<LANG>=[+|-]...
is
what you want.)
Qualified tags, whose inclusion is controlled by --extras=+q
, is
explained well with this idea.
Let’s see an example:
$ cat input.py
class Foo:
def func (self):
pass
$ ./ctags -o - --extras=+q --fields=+E input.py
Foo input.py /^class Foo:$/;" c
Foo.func input.py /^ def func (self):$/;" m class:Foo extra:qualified
func input.py /^ def func (self):$/;" m class:Foo
Foo and func are in input.py. So they are no extra tags. In other hand, Foo.func is not in input.py as is. The name is generated by ctags as a qualified extra tag entry. whitespaceSwapped extra flag of Robot parser is also aligned well on the idea.
I don’t say all parsers follows this idea.
$ cat input.cc
class A
{
A operator+ (int);
};
$ ./ctags --kinds-all='*' --fields= -o - input.cc
A input.cc /^class A$/
operator + input.cc /^ A operator+ (int);$/
In this example operator+ is in input.cc. In other hand, operator + is in the ctags output as non extra tag entry. See a whitespace between the keyword operator and + operator. This is an exception of the first idea.
The second idea is that if the inclusion of a tag cannot be
controlled well with --kind-<LANG>=[+|-]...
, the tag may be an
extra.
$ cat input.c
static int foo (void)
{
return 0;
}
int bar (void)
{
return 1;
}
$ ./ctags --sort=no -o - --extras=+F input.c
foo input.c /^static int foo (void)$/;" f typeref:typename:int file:
bar input.c /^int bar (void)$/;" f typeref:typename:int
$ ./ctags -o - --extras=-F input.c
foo input.c /^static int foo (void)$/;" f typeref:typename:int file:
$
Function foo of C language is included only when F extra flag
is enabled. Both foo and bar are functions. Their inclusions
can be controlled with f kind of C language: --kind-C=[+|-]f
.
The difference between static modifier or implicit extern modifier in a function definition is handled by F extra flag.
Basically the concept kind is for handling the kinds of language objects: functions, variables, macros, types, etc. The concept extra can handle the other aspects like scope (static or extern).
However, a parser developer can take another approach instead of introducing parser own extra; one can prepare staticFunction and exportedFunction as kinds of one’s parser. The second idea is a just guide; the parser developer must decide suitable approach for the target language.
Anyway, in the second idea, --extra
is for controlling inclusion
of tags. If what you want is not about inclusion, --param-<LANG>
can be used as the last resort.
Parser own parameter¶
To control the detail of a parser, --param-<LANG>
option is introduced.
--kinds-<LANG>
, --fields-<LANG>
, --extras-<LANG>
can be used for customizing the behavior of a parser specified with <LANG>
.
--param-<LANG>
should be used for aspects of the parser that
the options(kinds, fields, extras) cannot handle well.
A parser defines a set of parameters. Each parameter has name and takes an argument. A user can set a parameter with following notation
--param-<LANG>:name=arg
An example of specifying a parameter
--param-CPreProcessor:if0=true
Here if0 is a name of parameter of CPreProcessor parser and true is the value of it.
All available parameters can be listed with --list-params
option.
$ ./ctags --list-params
#PARSER NAME DESCRIPTION
CPreProcessor if0 examine code within "#if 0" branch (true or [false])
CPreProcessor ignore a token to be specially handled
(At this time only CPreProcessor parser has parameters.)
Customizing xref output¶
--_xformat
option allows a user to customize the cross reference
(xref) output enabled with -x
.
--_xformat=FORMAT
The notation for FORMAT is similar to that employed by printf(3) in
the C language; % represents a slot which is substituted with a
field value when printing. You can specify multiple slots in FORMAT.
Here field means an item listed with --list-fields
option.
The notation of a slot:
%[-][.][WIDTH-AND-ADJUSTMENT]FIELD-SPECIFIER
FIELD-SPECIFIER
specifies a field whose value is printed.
Short notation and long notation are available. They can be mixed
in a FORMAT. Specifying a field with either notation, one or more
fields are activated internally.
The short notation is just a letter listed in the LETTER column of
the --list-fields
output.
The long notation is a name string surrounded by braces({ and }). The name string is listed in the NAME column of the output of the same option. To specify a field owned by a parser, prepend the parser name to the name string with . as a separator.
Wild card (*) can be used where a parser name is specified. In this case both common and parser own fields are activated and printed. If a common field and a parser own field have the same name, the common field has higher priority.
WIDTH-AND-ADJUSTMENT is a positive number. The value of the number is used as the width of the column where a field is printed. The printing is right adjusted by default, and left adjusted when - is given as prefix. The output is not truncated by default even if its field width is specified and smaller than width of output value. For truncating the output to the specified width, use . as prefix.
An example of specifying common fields:
$ ./ctags -x --_xformat="%-20N %4n %-16{input}|" main/main.c | head
CLOCKS_PER_SEC 360 main/main.c |
CLOCKS_PER_SEC 364 main/main.c |
CLOCK_AVAILABLE 358 main/main.c |
CLOCK_AVAILABLE 363 main/main.c |
Totals 87 main/main.c |
__anonae81ef0f0108 87 main/main.c |
addTotals 100 main/main.c |
batchMakeTags 436 main/main.c |
bytes 87 main/main.c |
clock 365 main/main.c |
Here %-20N %4n %-16{input}| is a format string. Let’s look at the elements of the format.
%-20N
The short notation is used here. The element means filling the slot with the name of the tag. The width of the column is 20 characters and left adjusted.
%4n
The short notation is used here. The element means filling the slot with the line number of the tag. The width of the column is 4 characters and right adjusted.
%-16{input}
The long notation is used here. The element means filling the slot with the input file name where the tag is defined. The width of column is 16 characters and left adjusted.
|
Printed as is.
Another example of specifying parser own fields:
$ ./ctags -x --_xformat="%-20N [%10{C.properties}]" main/main.c
CLOCKS_PER_SEC [ ]
CLOCK_AVAILABLE [ ]
Totals [ ]
__anonae81ef0f0108 [ ]
addTotals [ extern]
batchMakeTags [ static]
bytes [ ]
clock [ ]
clock [ static]
...
Here “%-20N [%10{C.properties}]” is a format string. Let’s look at the elements of the format.
%-20N
Already explained in the first example.
[ and ]
Printed as is.
%10{C.properties}
The long notation is used here. The element means filling the slot with the value of the properties field of the C parser. The width of the column is 10 characters and right adjusted.
Incompatible changes in command line¶
-D
option¶
For a ctags binary that had debugging output enabled in the build config
stage, -D
was used for specifying the level of debugging
output. It is changed to -d
. This change is not critical because
-D
option was not described in ctags.1 man page.
Instead -D
is used for defining a macro in CPreProcessor parser.
Skipping utf-8 BOM¶
The three bytes sequence(’xEFxBBxBF’) at the head of an input file is skipped when parsing.
TODO:
Do the same in guessing and selecting parser stage.
Refect the BOM detection to encoding option