Doubled slashes in master_sites
It's a common occurrence that you'll see something like this while fetching a port: $ sudo port install python24 ---> Fetching python24 ---> Attempting to fetch Python-2.4.4.tar.bz2 from http:// www.python.org//ftp/python/2.4.4/ ^C $ Note the double slash after the hostname, which should just be a single slash. This is not a major problem, because the server still responds and delivers the desired file. But it's not 100% correct, and I'm a stickler for perfection, so here we go. The problem comes about because of these definitions in the portfile: homepage http://www.python.org/ master_sites ${homepage}/ftp/python/${version}/ The homepage ends with a slash, as it definitely should. But then the port author has defined master_sites with a slash after the homepage variable, which should not have been done. The correct definiton for master_sites would be: master_sites ${homepage}ftp/python/${version}/ I fix this in open ports when I see it and send patches to the maintainers of closed ports. In response to one such patch, Markus suggested that MacPorts base should automatically fix this. I suppose there is precedent, insofar as MacPorts will fix master_sites to end with a slash, if it does not already. I'm not sure if I'd be in favor of automated stripping of doubled slashes within URLs. It introduces a bit of magic into the master_sites variable, and I think magic should be avoided. And theoretically, a server could behave differently depending on the number of slashes. In practice, though, Apache collapses doubled slashes into a single one, and I don't know of any sites that would rely on double slashes in their download URLs. What do you all think?
On 12/10/07, Ryan Schmidt <ryandesign@macports.org> wrote: I'm not sure if I'd be in favor of automated stripping of doubled
slashes within URLs. It introduces a bit of magic into the master_sites variable, and I think magic should be avoided. And theoretically, a server could behave differently depending on the number of slashes. In practice, though, Apache collapses doubled slashes into a single one, and I don't know of any sites that would rely on double slashes in their download URLs.
What do you all think?
RFC 2396 (Uniform Resource Identifiers (URI): Generic Syntax) dictates that "URI that are hierarchical in nature use the slash "/" character for separating hierarchical components." so it seems to me that the correct behavior is that null path components (the component between the two slashes in "//") evaluate as a single slash "/". We should clean that up programatically, since some port maintainers may prefer to leave the extra slashes around the variables for legibility purposes. -- Randall Wood randall.h.wood@alexandriasoftware.com "The rules are simple: The ball is round. The game lasts 90 minutes. All the rest is just philosophy."
On Dec 10, 2007, at 03:44, Randall Wood wrote:
On 12/10/07, Ryan Schmidt wrote:
I'm not sure if I'd be in favor of automated stripping of doubled slashes within URLs. It introduces a bit of magic into the master_sites variable, and I think magic should be avoided. And theoretically, a server could behave differently depending on the number of slashes. In practice, though, Apache collapses doubled slashes into a single one, and I don't know of any sites that would rely on double slashes in their download URLs.
What do you all think?
RFC 2396 (Uniform Resource Identifiers (URI): Generic Syntax) dictates that "URI that are hierarchical in nature use the slash "/" character for separating hierarchical components." so it seems to me that the correct behavior is that null path components (the component between the two slashes in "//") evaluate as a single slash "/".
Thanks for the RFC reference. In particular, section 3.3 "Path Component" says that for URLs that have a path component, which all our master_sites URLs do, path_segments is made up of one or more segments combined with slashes, and a segment is defined as one or more pchars (and a slash is not a valid pchar). Therefore I think there can be no RFC-compliant URL (having a path component) which has a slash next to another slash.
We should clean that up programatically, since some port maintainers may prefer to leave the extra slashes around the variables for legibility purposes.
I'm prepared to agree to that. But we should consider other variables and other situations as well. Some people write "${destroot}/$ {prefix}" in some places, when ${prefix} already starts with a slash so it should really be written "${destroot}${prefix}". This kind of thing occurs all over portfiles, and I don't think there's any specific variables we could target for any cleanup operations. So one could argue: why should we apply this magic to master_sites when we can't apply it elsewhere? Shouldn't we instead educate portfile authors that they are concatenating variables together, and to think about what those variables contain so that they put slashes where the belong and leave them off where they don't belong? In fact... I'll bet we can write something into "port lint" to detect and report such issues. I might prefer such education over the magic.
participants (2)
-
Randall Wood
-
Ryan Schmidt