Re: [24079] trunk/base/src/pextlib1.0/find.c
On Apr 15, 2007, at 10:44 PM, source_changes@macosforge.org wrote:
Rewrite Pextlib find command. Now actually has useful API. Usage is find ?-depth? ?-ignoreErrors? target ?target ...? varname body. - depth identical to -d switch to find. -ignoreErrors causes it to ignore permission/read errors. Does not follow symlinks.
I'm not sure I exactly agree with that statement. First, the usage statement is a little confusing - can you cite some actually usage examples? Second, you've forced the execution of the body every time, and that was not the original intention of this API. The original intention was to have an Expr evaluated as a predicate and ONLY execute the body if the expr matched, allowing you to separate the "work statements" from the "match statements" - now you have to write a conditional into every body instead and it doesn't read as well. The fact that many people used find with an expr of "1" is besides the point - you could just as easily match against a specific filename regexp, or file type, or whatever. Just as find(1) has a whole bunch of predicates which precede -exec, so this command was intended to give you one complex predicate (with all the usual boolean operators) and the equivalent of "-exec" in the body. I'm having a hard time seeing this as anything but a regression, so perhaps you could explain your rewrite to the original author, at least? :-) - Jordan
The users of the original find command was, well, nobody. The few times I considered using it, I shied away as well. Why? Because I didn't want to add dependencies on the API when it was, frankly, pretty bad. It set a global variable (yes, not even local to the calling procedure, but global) called _filename. I'm also confused as to your complaint about it executing the body every time. With your original implementation, it executed the match every time, and every time it succeeded (most of the time, as it defaulted to "expr 1") it would then execute the body (which defaulted to printing the file, though I'm not sure why as any practical usage of the command doesn't want to print the result). The phrase "The fact that many people used find" is pretty strange seeing as, to the best of my knowledge, absolutely nobody was using the function. I used it for a very brief time, and then I stopped because I didn't want to rely on an API that I intended to change. In any case, the standard usage for something like this is to traverse a filesystem and perform some action on each file that matches the criteria. The new find command is intended specifically for this. It's basically a foreach on a recursive glob. A theoretical usage might look as follows: find src file { switch -glob $file { .svn - CVS { continue } *.o { file delete $file } } } This particular example deletes all object files in the src directory, skipping the .svn and CVS dirs because it knows it doesn't need to recurse into them. Doing the same with the old command would be quite bizarre - the match body has to match directories or it won't recurse, which means the action body has to also do its own matching to make sure it's acting on the object file and not on a directory. BTW, if you're confused, I wrote the usage wrong in that commit message - varname comes before the target list. The correct usage is documented in the portfile.7 manpage. Can you please explain why you think this is a regression? I will note that I tried to use your find command at one point, and ended up having to pour over the source code to figure out how it was intended to be used and why it wasn't behaving the way I expected it to. On Apr 16, 2007, at 4:15 AM, Jordan K. Hubbard wrote:
On Apr 15, 2007, at 10:44 PM, source_changes@macosforge.org wrote:
Rewrite Pextlib find command. Now actually has useful API. Usage is find ?-depth? ?-ignoreErrors? target ?target ...? varname body. -depth identical to -d switch to find. -ignoreErrors causes it to ignore permission/read errors. Does not follow symlinks.
I'm not sure I exactly agree with that statement. First, the usage statement is a little confusing - can you cite some actually usage examples?
Second, you've forced the execution of the body every time, and that was not the original intention of this API. The original intention was to have an Expr evaluated as a predicate and ONLY execute the body if the expr matched, allowing you to separate the "work statements" from the "match statements" - now you have to write a conditional into every body instead and it doesn't read as well. The fact that many people used find with an expr of "1" is besides the point - you could just as easily match against a specific filename regexp, or file type, or whatever. Just as find (1) has a whole bunch of predicates which precede -exec, so this command was intended to give you one complex predicate (with all the usual boolean operators) and the equivalent of "-exec" in the body.
I'm having a hard time seeing this as anything but a regression, so perhaps you could explain your rewrite to the original author, at least? :-)
-- Kevin Ballard http://kevin.sb.org eridius@macports.org http://www.tildesoft.com
On Apr 16, 2007, at 2:09 AM, Kevin Ballard wrote:
The users of the original find command was, well, nobody. The few times I considered using it, I shied away as well. Why? Because I didn't want to add dependencies on the API when it was, frankly, pretty bad. It set a global variable (yes, not even local to the calling procedure, but global) called _filename.
I'm not defending making the _filename var a fixed global - I even said in the previous threads on this topic that this was a clear and obvious shortcoming that needed fixing (which is also why I raised no complaint whatsoever about this aspect of the change).
I'm also confused as to your complaint about it executing the body every time. With your original implementation, it executed the match every time, and every time it succeeded (most of the time, as it defaulted to "expr 1") it would then execute the body (which defaulted to printing the file, though I'm not sure why as any practical usage of the command doesn't want to print the result).
The fact that { 1 } and { print $_filename } are the defaults shouldn't be taken as an indication that we don't need a flexible API. You can type "find ." and get useful results at the shell, too, but no one argues that find(1)'s other options should be deprecated since that's all one would ever want to do with it. I do a lot of things with find(1) that don't print the result - they manipulate the filenames found silently and move on. I'm also hardly defending the original find API as perfect - never said it was. However, like xinstall, I think it's a worthwhile goal to make it follow in the footsteps of its command-line analog so as to ease the process of people coming up to speed with it. It's not going to be identical, naturally, since Tcl is a "real language" unlike /bin/sh and there are more powerful things you can do with it, but it should at least follow the POLA as much as it can. I'm still waiting for some usage examples which demonstrate how this is a marked improvement.
In any case, the standard usage for something like this is to traverse a filesystem and perform some action on each file that matches the criteria. The new find command is intended specifically for this. It's basically a foreach on a recursive glob. A theoretical usage might look as follows:
find src file { switch -glob $file { .svn - CVS { continue } *.o { file delete $file } } }
And maybe it's just me, but I would find this to be far more readable (and concise): find file src {string match *.o $file} {file delete $file} Note that I'm also talking about "general case" usage here, where the user is really trying to accomplish a simple and straight-forward job in one line without too much knowledge of the internals of find's recursion behavior, something I think you'll find to be true in the majority of usage scenarios rather than the converse. Is that one line equivalent to your script? No, because as you note, we don't short-circuit recursion in the .svn or CVS cases, but I also think your example is just a wee bit contrived given that most folks will be using this primitive to do operations in $destroot and not checked-out trees of source where this kind of short-circuit behavior is typical or even important. Still, if you were to allow multiple expr/body pairs, I suppose you could always do it like this: find file src {regex .svn|CVS $file} {return 1} {string match *.o $file} {file delete $file} Where "return 1" could have the special case meaning of "don't recurse." Again, however, I don't see this as the typical usage case, just as people don't typically use find(1) with 11 predicates and parenthesized sub-expressions even though that's technically possible. They use maybe one predicate and a fairly typical action, like -exec rm {} or print. That's the usage case I was trying to optimize for in my version and while I don't think I necessarily did the best job, I think you've gotten further away, not closer, to the goal of making it really simple for the majority of intended usage scenarios. - Jordan
On Apr 16, 2007, at 5:35 AM, Jordan K. Hubbard wrote:
On Apr 16, 2007, at 2:09 AM, Kevin Ballard wrote:
I'm also confused as to your complaint about it executing the body every time. With your original implementation, it executed the match every time, and every time it succeeded (most of the time, as it defaulted to "expr 1") it would then execute the body (which defaulted to printing the file, though I'm not sure why as any practical usage of the command doesn't want to print the result).
The fact that { 1 } and { print $_filename } are the defaults shouldn't be taken as an indication that we don't need a flexible API. You can type "find ." and get useful results at the shell, too, but no one argues that find(1)'s other options should be deprecated since that's all one would ever want to do with it. I do a lot of things with find(1) that don't print the result - they manipulate the filenames found silently and move on.
I'm also hardly defending the original find API as perfect - never said it was. However, like xinstall, I think it's a worthwhile goal to make it follow in the footsteps of its command-line analog so as to ease the process of people coming up to speed with it. It's not going to be identical, naturally, since Tcl is a "real language" unlike /bin/sh and there are more powerful things you can do with it, but it should at least follow the POLA as much as it can. I'm still waiting for some usage examples which demonstrate how this is a marked improvement.
Unlike xinstall/install, find is a command that's far more likely to be used in the MacPorts base code than in a Portfile. Making it match the find command-line usage simply makes it more complicated to use, since it doesn't actually match the command-line usage. And I'm not sure why you're arguing against my changes in this context, since your implementation didn't match command-line usage either. And yes, you can type "find ." and get useful results at the shell, and that's because you're either doing that to tell yourself about files, or you're piping the results elsewhere. Inside a tcl script you're not examining the filesystem by hand with find, nor are you piping the results.
In any case, the standard usage for something like this is to traverse a filesystem and perform some action on each file that matches the criteria. The new find command is intended specifically for this. It's basically a foreach on a recursive glob. A theoretical usage might look as follows:
find src file { switch -glob $file { .svn - CVS { continue } *.o { file delete $file } } }
And maybe it's just me, but I would find this to be far more readable (and concise):
find file src {string match *.o $file} {file delete $file}
Except, of course, that what you just typed wouldn't work. It would find all the object files in the src directory, but not in any of its subdirectories because it wouldn't match any of its subdirectories. And, of course, it doesn't match the behaviour of my example - it will recurse into .svn and CVS directories. To match the same behaviour you'd have to do the following: find file src {expr {[string match *.o $file] || ([file type $file] eq "directory"] && ![string match $file */.svn] && ![string match $file */.svn])}} { if {[file type $file] ne "directory"} { file delete $file } } Pretty ugly, no? Sure, you could try cleaning it up like find file src { switch -glob $file { */.svn - */CVS - *.o { return 1 } default { return 0 } } } { if {[string match $file *.o]} { file delete $file } } But that's hardly any cleaner. Incidentally, it just occurred to me that my example won't quite work right, as $file contains the full path. Either the switch should be on [file tail $file] or the .svn and CVS patterns should be */.svn and */CVS.
Note that I'm also talking about "general case" usage here, where the user is really trying to accomplish a simple and straight- forward job in one line without too much knowledge of the internals of find's recursion behavior, something I think you'll find to be true in the majority of usage scenarios rather than the converse. Is that one line equivalent to your script? No, because as you note, we don't short-circuit recursion in the .svn or CVS cases, but I also think your example is just a wee bit contrived given that most folks will be using this primitive to do operations in $destroot and not checked-out trees of source where this kind of short-circuit behavior is typical or even important. Still, if you were to allow multiple expr/body pairs, I suppose you could always do it like this:
find file src {regex .svn|CVS $file} {return 1} {string match *.o $file} {file delete $file}
Where "return 1" could have the special case meaning of "don't recurse." Again, however, I don't see this as the typical usage case, just as people don't typically use find(1) with 11 predicates and parenthesized sub-expressions even though that's technically possible. They use maybe one predicate and a fairly typical action, like -exec rm {} or print. That's the usage case I was trying to optimize for in my version and while I don't think I necessarily did the best job, I think you've gotten further away, not closer, to the goal of making it really simple for the majority of intended usage scenarios.
I think the biggest problem here is you're optimizing for Portfile use case while I'm optimizing for MacPorts base code use case. Why am I doing this? Because I know right off the bat two places where this is useful in base code - the delete proc in portutil and pippings new +universal build stuff for ports like openssl that need two single- architecture builds. I can understand what you're saying about making it work similar to find. I can see about 16 uses of system "find ..." in Portfiles right now, but trying to replace them with your old command would be difficult. If you really want to work towards acting like find(1), maybe we should implement a command that actually does act like find (1)? And then we can rename my version of find to fs-traverse. How does that sound? -Kevin Ballard -- Kevin Ballard http://kevin.sb.org eridius@macports.org http://www.tildesoft.com
On Apr 16, 2007, at 2:23 PM, Kevin Ballard wrote:
Unlike xinstall/install, find is a command that's far more likely to be used in the MacPorts base code than in a Portfile. Making it match the find command-line usage simply makes it more complicated to use, since it doesn't actually match the command-line usage. And I'm not sure why you're arguing against my changes in this context, since your implementation didn't match command-line usage either.
You're right, neither of our implementations was the find(1) counterpart that xinstall is to install(1). However, you've already renamed your find so I guess this point is somewhat moot and now we simply need to define the feature set of the find procedure. I hope we're at least on the same page with respect to using that proc predominantly from Portfiles vs from base?
find file src {string match *.o $file} {file delete $file}
Except, of course, that what you just typed wouldn't work. It would find all the object files in the src directory, but not in any of its subdirectories because it wouldn't match any of its subdirectories.
Well, again, you don't need to tell find(1) to traverse subdirs so I wasn't expecting that it needed to be explicit in this case either. If I just wanted to do file ops within the current directory, I'd use glob and eval. :-)
I think the biggest problem here is you're optimizing for Portfile use case while I'm optimizing for MacPorts base code use case. Why am I doing this? Because I know right off the bat two places where this is useful in base code - the delete proc in portutil and pippings new +universal build stuff for ports like openssl that need two single-architecture builds.
Well, perhaps now that we've bifurcated, we can satisfy the needs of both. So, shall we start defining the behavior of find before doing the implementation and avoid the risk of getting into unnecessary arguments post-facto? :-) Incidentally, I hope you've also twigged to the fact that modifying someone else's code in an OSS project is almost always bound to lead to a more heated discussion than if you'd actually discussed your proposed changes in advance and allowed for public comment before changing anything. MacPorts isn't unique in this way - if you'd gone and hacked on one of *BSD or Linux's device drivers and committed the changes without saying anything first, well, let's just say this conversation would still be burning hot with lots of folks piling on. MacPorts is small enough that this doesn't really happen yet, but the principles remain the same and once MP gets bigger, you'll see an increasing need for discussion in advance of "svn commit". - Jordan
On Apr 17, 2007, at 1:24 AM, Jordan K. Hubbard wrote:
find file src {string match *.o $file} {file delete $file}
Except, of course, that what you just typed wouldn't work. It would find all the object files in the src directory, but not in any of its subdirectories because it wouldn't match any of its subdirectories.
Well, again, you don't need to tell find(1) to traverse subdirs so I wasn't expecting that it needed to be explicit in this case either. If I just wanted to do file ops within the current directory, I'd use glob and eval. :-)
Yeah, but I was describing how your example would have behaved with your implementation, not how it should behave in an ideal world.
Incidentally, I hope you've also twigged to the fact that modifying someone else's code in an OSS project is almost always bound to lead to a more heated discussion than if you'd actually discussed your proposed changes in advance and allowed for public comment before changing anything. MacPorts isn't unique in this way - if you'd gone and hacked on one of *BSD or Linux's device drivers and committed the changes without saying anything first, well, let's just say this conversation would still be burning hot with lots of folks piling on. MacPorts is small enough that this doesn't really happen yet, but the principles remain the same and once MP gets bigger, you'll see an increasing need for discussion in advance of "svn commit".
Yeah, I know, but the fact of the matter was nobody used find, and I mean nobody. And the other part is that, in general, it's far more conducive to stop thinking of code as "my code" or "his code" and to think of it simply as project code. Sure, you wrote it, but in the end does that matter? Discussion should be about the merits of the code, not about ownership. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org eridius@macports.org http://www.tildesoft.com
On Apr 16, 2007, at 10:30 PM, Kevin Ballard wrote:
Yeah, I know, but the fact of the matter was nobody used find, and I mean nobody. And the other part is that, in general, it's far more conducive to stop thinking of code as "my code" or "his code" and to think of it simply as project code. Sure, you wrote it, but in the end does that matter? Discussion should be about the merits of the code, not about ownership.
There are reasons beyond ego of ownership -- the author is going to generally be able to provide the greatest insight into design decisions and implementation. "My code" and "his code" are very real constructs in this context, and the authorship does matter. When you make a significant unilateral change to an existing contributor's code without consultation, this may reasonably be considered a raucous disregard for their design and implementation choices. At the very least, providing "change justification" demonstrates a full consideration of the implementation as compared to the existing code -- the author can then evaluate the changes according to his original design requirements.
participants (3)
-
Jordan K. Hubbard
-
Kevin Ballard
-
Landon Fuller