Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cxx: extract parameters in a prototype #3063

Merged

Conversation

masatake
Copy link
Member

Close #3045.

Parameters in functions were tagged.
This change extends it.
With this change, parameters in prototypes are tagged.

It is usual that a parameter in a prototype has no name. In such a case, this change generates an anonymous name.

Close universal-ctags#3060.

input.h:

	template <class T> int f (T t);

The scope kind for T was function.
It should be prototype.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
C++ rejects using "this" as a nmae of a parameter.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake requested a review from pragmaware June 10, 2021 22:34
@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #3063 (a22956a) into master (d377176) will increase coverage by 0.04%.
The diff coverage is 96.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3063      +/-   ##
==========================================
+ Coverage   87.33%   87.37%   +0.04%     
==========================================
  Files         199      199              
  Lines       47611    47732     +121     
==========================================
+ Hits        41579    41704     +125     
+ Misses       6032     6028       -4     
Impacted Files Coverage Δ
parsers/cxx/cxx_parser_function.c 92.72% <94.44%> (+0.04%) ⬆️
parsers/cxx/cxx_parser.c 85.42% <100.00%> (+0.08%) ⬆️
parsers/cxx/cxx_scope.c 88.67% <100.00%> (+0.21%) ⬆️
parsers/cxx/cxx_tag.c 90.16% <100.00%> (ø)
parsers/cxx/cxx_token.c 90.12% <100.00%> (+1.23%) ⬆️
parsers/jscript.c 97.15% <0.00%> (ø)
main/parse.c 95.86% <0.00%> (+<0.01%) ⬆️
main/promise.c 97.60% <0.00%> (+0.96%) ⬆️
parsers/sql.c 72.81% <0.00%> (+1.62%) ⬆️
main/strlist.c 78.76% <0.00%> (+2.61%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7dce57...a22956a. Read the comment docs.

@masatake masatake force-pushed the cxx-parameters-in-prototypes branch from fe4ab03 to 93badee Compare June 11, 2021 00:31
@pragmaware
Copy link
Contributor

Currently the function parameters are enabled via +z which is "function parameters inside function definitions".
We have to change the help string to include prototypes too.... but I wonder if it makes sense to hide these behind a different "switch" at all so one could enable prototypes and function parameters but not the prototype parameters.
The same consideration could extend also to the template arguments related to prototypes...

@masatake
Copy link
Member Author

Currently the function parameters are enabled via +z which is "function parameters inside function definitions".
We have to change the help string to include prototypes too.... but I wonder if it makes sense to hide these behind a different "switch" at all so one could enable prototypes and function parameters but not the prototype parameters.

I think using the same switch 'z' for both function parameters and prototype parameters is better.
Other than scope, I think they are the same.
About template parameter, using different switch 'Z' is good for me. For me, template parameter looks very different concept from function and prototype parameter. Maybe because I'm not familier with C++. Anyway we already introduced Z. So we should not change it.

The following shows how the pull request works:

$ cat input.c
cat input.c
void p(int x);

void f(int y)
{
}
$ ./ctags -o - input.c
./ctags -o - input.c
f	input.c	/^void f(int y)$/;"	f	language:C	typeref:typename:void
$ ./ctags -o - --kinds-C=+p input.c
./ctags -o - --kinds-C=+p input.c
f	input.c	/^void f(int y)$/;"	f	language:C	typeref:typename:void
p	input.c	/^void p(int x);$/;"	p	language:C	typeref:typename:void	file:
$ ./ctags -o - --kinds-C=+z input.c
./ctags -o - --kinds-C=+z input.c
f	input.c	/^void f(int y)$/;"	f	language:C	typeref:typename:void
y	input.c	/^void f(int y)$/;"	z	language:C	function:f	typeref:typename:int	file:
$ ./ctags -o - --kinds-C=+zp input.c
./ctags -o - --kinds-C=+zp input.c
f	input.c	/^void f(int y)$/;"	f	language:C	typeref:typename:void
p	input.c	/^void p(int x);$/;"	p	language:C	typeref:typename:void	file:
x	input.c	/^void p(int x);$/;"	z	language:C	prototype:p	typeref:typename:int	file:
y	input.c	/^void f(int y)$/;"	z	language:C	function:f	typeref:typename:int	file:
$ ./ctags -o - --kinds-C=+zp-f input.c
./ctags -o - --kinds-C=+zp-f input.c
p	input.c	/^void p(int x);$/;"	p	language:C	typeref:typename:void	file:
x	input.c	/^void p(int x);$/;"	z	language:C	prototype:p	typeref:typename:int	file:
y	input.c	/^void f(int y)$/;"	z	language:C	function:f	typeref:typename:int	file:

In this pull request, ctags doesn't emit tags for parameters of a prototype if the prototype kind is disabled <1>.
On the other hands, ctags emits tags for parameters of a function even if the function kind is disabled <2>.

In the most of all parsers I wrote, I implemented the behavior <2>: tagging children even if the kind for their parent object is disabled. However, in this pull request I chose <1> because emitting tags for parameters of prototype when disabling prototype kind looks too noisy.

@pragmaware, how I should do?
See #3043 about the background of this pull request.

Close universal-ctags#3045.

If types of parameters are declared in a prototype, this change
generates anonymous names for them and makes tags.
@masatake masatake force-pushed the cxx-parameters-in-prototypes branch from 93badee to a22956a Compare June 11, 2021 05:20
@pragmaware
Copy link
Contributor

[...]
Other than scope, I think they are the same.

Well, they obviously refer to the same "physical" parameters, so in this sense they are the same. However, the usage by the ctags clients may be very different. Take the very common operation of "completion": most editors that use ctags do that. The parameters of the function are essentially local variables: one wants to complete them in the scope of the function and thus they are very useful. The parameters of the prototype have no such role: one never wants to complete them or even to look them up by name. The only use of prototype parameters is to help analyzing the signature of the function "from outside", for example when filling the arguments of the call. But that's a rather advanced feature, few editors really have it and now that I look at it ctags doesn't even have enough information to implement it that way (index of the parameter is missing).

My editor, for instance, has the first feature (completion) but has no use for the parameters of the prototype: it would discard them. That's why I'd like to keep function parameters on but prototype parameters off. But in the end, it's not a big deal: they are easy to filter out, now that they are marked with the "prototype" scope.

About template parameter, using different switch 'Z' is good for me. For me, template parameter looks very different concept from function and prototype parameter. Maybe because I'm not familier with C++.

From an editor-using-ctags perspective they are kind-of similar. Inside a function definition the template parameters are placeholders for types. One completes and resolves them like types in the function body. The prototype ones, again, are used as map/translation but never as completion. Editors fully handling template parameters are even more exotic beasts...

Anyway we already introduced Z. So we should not change it.

I agree... though for the prototype template parameters nobody would notice if we added another kind :)

In the most of all parsers I wrote, I implemented the behavior <2>: tagging children even if the kind for their parent object is disabled. However, in this pull request I chose <1> because emitting tags for parameters of prototype when disabling prototype kind looks too noisy.

Makes sense. I can't think of a good reason one would want to see the prototype parameters but not the prototypes...

@masatake
Copy link
Member Author

Thank you for the ​suggestive comment.

... (index of the parameter is missing).

Yes. I have been thinking about adding "nth:" as a COMMON field.

input.c:

struct point *make_point(int x0, int y0);
struct point {
  struct object base;
  int x;
  int y;
};
void abort(void);
struct char *format(char *fmt,...);

expected.tags:

make_point	f.c	/^struct point *make_point(int x0, int y0);$/;"	p	typeref:struct:point *	file:	arity:2
x0	f.c	/^struct point *make_point(int x0, int y0);$/;"	z	prototype:make_point	typeref:typename:int	file:	nth:0
y0	f.c	/^struct point *make_point(int x0, int y0);$/;"	z	prototype:make_point	typeref:typename:int	file:	nth:1
point	f.c	/^struct point {$/;"	s	file:
base	f.c	/^  struct object base;$/;"	m	struct:point	typeref:struct:object	file:	nth:0
x	f.c	/^  int x;$/;"	m	struct:point	typeref:typename:int	file:	nth:1
y	f.c	/^  int y;$/;"	m	struct:point	typeref:typename:int	file:	nth:2
abort	f.c	/^void abort(void);$/;"	p	typeref:typename:void	file:	arity:0
format	f.c	/^struct char *format(char *fmt,...);$/;"	p	typeref:struct:char *	file:	arity:1+
fmt	f.c	/^struct char *format(char *fmt,...);$/;"	z	prototype:format	typeref:typename:char *	file:	nth:0

(signature fields are omitted in this example.)
pParamInfo keeps parameters in arrays. So filling nth fields for parameters may be easy.
I would like to implement nth after merging this.
I think nth: field for struct members is not so bad idea. It may be less useful than that for parameters.
In C language, for representing inheritance, developers put an object of the parent class to nth:0 member of a structure.
In other words, from the nth:0, we can know class hierarchy.

I'm also thinking about arity: field. I'm not sure how useful it is.

About template parameter(tparam), whether a tparam is for template prototype or template function can be known from its scope field. See f281087 included in this pull request or #3062.

@masatake
Copy link
Member Author

I found nth: field is not enough if the target language, like C++, supports overloading.

input.cc:

void f(int x) {}
void f(float y, int x) {}
void f(int x, float y) {}

u-ctags --sort=no --fields=+oS --kinds-C++=+zp -o - input.cc | grep ^x

x	input.cc	/^void f(int x) {}$/;"	z	function:f	typeref:typename:int	file:	nth:0
x	input.cc	/^void f(float y, int x) {}$/;"	z	function:f	typeref:typename:int	file:	nth:1
x	input.cc	/^void f(int x, float y) {}$/;"	z	function:f	typeref:typename:int	file:	nth:0

I wonder how about introducing inSignature: field like:

x	input.cc	/^void f(int x) {}$/;"	z	function:f	typeref:typename:int	file:	nth:0	inSignature:(int x)
x	input.cc	/^void f(float y, int x) {}$/;"	z	function:f	typeref:typename:int	file:	nth:1	inSignature:(float y, int x)
x	input.cc	/^void f(int x, float y) {}$/;"	z	function:f	typeref:typename:int	file:	nth:0	inSignature:(int x, float y)

@pragmaware
Copy link
Contributor

pragmaware commented Jun 14, 2021

I found nth: field is not enough if the target language, like C++, supports overloading.

Yes, this is a general problem with overloads, not only related to the nth field. In my editor I solve it by looking at line numbers.
When looking for the containing scope of a symbol I preferentially pick the one that is defined in the same file, with a line number that is lower or equal and the closest one. This mostly works.

However, ctags internally has the hierarchy information in this case so it could be output in some way. For example, we could assign a synthetic unique identifier to the parent symbol (function definition) and reference it in the child symbol (parameter)... this would be unambiguous.

@pragmaware
Copy link
Contributor

Now that I think of it, it's not only a problem with overloads. It's potentially with any symbol. For example, when taking into consideration a large C/C++ code base, possibly including libraries, it's not uncommon to have multiple classes with the same name in the global namespace. Or maybe the same static helper function repeated in many cpp files... It's just enough to make sure the compiler can see only one of these symbols at a time. I believe this can happen also in most other languages.

So a tree-building ctags client has always to solve the child-parent matching problem in some way. File+line-number matching is one way but it's not 100% reliable. Having unique ids for symbols would be a nice addition.

@masatake
Copy link
Member Author

Having unique ids for symbols would be a nice addition.

Do you think cork-indexes are appropriate as the ids?

To improve ctags in the view of this topic, I thought what we needed.

A. tags.d. Managing whole the source tree in a tags file is hard. I'm thinking about per-source file tags files. tags.d is a name of directory where the tags files are stored. This allows incremental updating.

B. filesystem langauge. ctags is a file oriented program. It doesn't consider relation between two input files. filesystem language provides the way to record more information about input files to tags file in extensible way. ctags has file kind already. However, it is hard to attach more information on it.

C. multi-pass parsing. An experimental implementation is proposed in #2741. #2471 reads a tags file as hint. I must read whole the tags file. If the tags file is larger, loading takes more time. Instead of dealing a tags file, tags files under tags.d may allow on-demand loading.

I'm not sure these are the way to go.
Much more studies are needed.
I wonder how code3 uses a tags file.

@masatake
Copy link
Member Author

@pragmaware, can I merge this pull request?

@pragmaware
Copy link
Contributor

Do you think cork-indexes are appropriate as the ids?

Anything unique in the context of a tags file will do. If the cork indexes are unique and don't change then they are OK.

I wonder how code3 uses a tags file.

I must say that a tags file by itself has limited application. When it's properly sorted it can be reasonably used to lookup a single symbol at a time without consideration of scope. However this is only the most basic form of "developer assistance". Most IDEs and code editors do far more than that and use derived structures that are more complex (and efficient) than a tag file.

In fact, in code3 case the tags file is never really written to disk. ctags is launched on subtrees of code (whole source tree, an include directory, a single file as it is being edited and so on) and its standard output is processed directly into memory.

The memory structure is called "symbol store" and is composed of different pieces:

  • a list of "symbol files", which are lists of symbols associated to a single source file sorted by line number plus a few additional maps used to speed-up the context lookups
  • a symbol tree for each language that is being manipulated: a real tree, with all the container -> member relationships
  • a sorted list of symbols for each language (similar to a tag file but with in-memory objects interconnected with the other structures)

If you want to take a look, the symbol store is here, while the symbol tree parts are here.

This whole structure allows the editor to quickly do complex and accurate completions. Say that the user is editing the following file with the cursor placed at position <cursor>:

...
Class2 c;
...
void Class1::method1()
{
   c.mem<cursor>
}
...

If the user now presses tab, the editor will:

  • extract the text before the cursor -> c.mem
  • figure out that mem is the token to complete and it is a member of an aggregate that is in the c variable
  • find the current symbol file in the symbol store
  • lookup the current context in the symbol file, that is, find the symbol (function) that is containing the current line number (that would be function method1 - note how start and end line numbers are useful here)
  • recursively look for the symbol c: the algorithm would first look for variables or parameters inside the context symbol method1, then look for member variables in its container Class1... and finally climb up to the global scope to find c as global variable.
  • resolve the type of c, that is Class2 and look it up in the global scope (the scope of definition of c)
  • find all the members (variables or functions) of Class2 that start with the string mem
  • if there is only one then just complete c.mem to the name of the symbol
  • if there are multiple then complete the common part and present a choice dialog to the user

Quite complicated process. In more involved cases may take a noticeable time with optimized in-memory structures. Totally unfeasible with only a tag file.

I don't think ctags should try to do this process internally. The reasons are:

  • it's really too complicated
  • it requires several complicated structures in memory that must be updated on-the-fly
  • it assumes a global knowledge of the whole source tree the user is manipulating, something that a single ctags run may not have
  • would almost certainly slow ctags down quite a lot.. but this process must be fast.
  • would possibly require compatibility-breaking changes

What is mean is simply: the tags file is good as it is. It's just a plain symbol storage format.

I think ctags should:

  • provide the maximum amount of information attached to each symbol
  • allow the user to choose what is needed and what not (because the useless information is wasted space/time).
  • possibly provide a more reliable way to re-build the tree: as I have noted in previous comments there is considerable amount of "guessing"
  • eventually provide a separate library to implement the process I've described, all in memory

Copy link
Contributor

@pragmaware pragmaware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it!

@masatake masatake merged commit 1020768 into universal-ctags:master Jun 16, 2021
@masatake
Copy link
Member Author

Do you think cork-indexes are appropriate as the ids?
Anything unique in the context of a tags file will do. If the cork indexes are unique and don't change then they are OK.

A cork index is unique in a input file. However, it can be duplicated. So unique id for each input file is needed in addition to the cork index.

@pragmaware
Copy link
Contributor

A cork index is unique in a input file. However, it can be duplicated. So unique id for each input file is needed in addition to the cork index.

Just use a global integer incremented at each new symbol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C: report parameters in a prototype with anonymous tags
2 participants