For a long time I’ve been an advocate of Python’s notion of controlling access to private and protected members (attributes, methods, etc) with conventions, by simply naming them like “_name”, with an initial underline. Even though Python does support the “__name” (with double underscore) for “private” members (this actually mangles the name rather than hiding it), you’ll notice that even this is rarely used in practice, and the largely agreed mantra is that convention should be enough and thus one underscore suffices. This always resonated quite well with me, since I generally prefer to handle situations by agreement rather than enforcement. Well, I’m now changing my opinion.that this works well for this purpose, at least in certain situations.
This methodology may work quite well in situations where the code scope is within a very controlled environment, with one or more teams which follow strictly a single development guideline, and have the power to refactor the affected code base somewhat easily when the original decisions are too limiting.
Having worked on a few major projects now, and some of them being libraries which are used by several teams within the same company or outside, I now perceive that people very often take shortcuts over these decisions for getting their job done quickly. It’s way easier to simply read the code and get to the private guts of a library than to try to get agreement over the right way to do something, or sending a patch with a suggested change which was carefully architected.
Many people by now are probably thinking: “Well, that’s their problem, isn’t it? If their code base breaks on the next upgrade they’ll get burned and won’t be able to upgrade cleanly.”, and I can honestly understand this feeling, since I shared it. But, for a number of reasons, I now understand that this isn’t just their problem, it’s very much my problem too.
Most importantly, on any serious software, these problems will usually come back to the implementors, and many times the problem will have a much larger magnitude by then than they had at the time a change could have been done “the right way” on the implementation, because code dependent on the private bits will have settled.
Most people are optimist by nature and believe that the implementation won’t change, but, of course, one of the reasons why private information is made private in the first place is exactly because the implementor believes that having the freedom to change these details in the future is important, and not rarely there’s already a plan of evolution in place for these private pieces, which may include revamping the implementation entirely for scalability or for other goals.
In the best case, the careless people will get burned on the upgrade and will ask for support or simply won’t upgrade silently, and both cases hurt implementors, because providing support for broken software takes time and energy, and amazingly can even hurt the software image. Lack of upgrades also means more ancient versions in the wild to give support for. Besides these, in the worst case scenario, the careless people have enough influence on the affected project to cause as much burden on it as if the private data was public in the first place.
As much as I’m a believer in handling situation by agreement rather than enforcement, I’m also a believer that when the rules don’t work for people, the rules should be changed, not the people. So my positioning now is that the language supported access constraints (public, protected, private), as available in languages like Java and C++, are a better alternative when compared to convention as used today in Python, since they provide an additional layer of encouragement for people to not break the rules carelessly, and that helps in the maintenance and reuse of software that has greater visibility.
What do you think about the problems that arise from too much access constraints? It’s impossible for the API designer to anticipate every possible use for their code, and sometimes private variables just get in the way of breaking the rules carefully.
The Java assumption of “you will never have any valid reason to touch this class/variable” is just not always true. Sending a patch is a good idea, but not always feasible given how long it may take, and that it may only be of interest to you and nobody else.
Sometimes it’s not just a matter of getting your job done quickly, but of getting it done _at all_.
You also have to consider that Java access specifiers are only convention anyway; there are lot of ways to break the encapsulation, ranging from some reflection hackery to decompiling the class files, just to name two I have personally used.
Anyway, you do have good points about maintainence headaches, but perhaps the downsides of access control are bigger than you present here.
Indeed it’s not possible to anticipate all use cases, but it’s possible to cover new use cases as we learn about them. It’s way better than having to handle a bad design forever.
That’s not the assumption which is made. The assumption is I don’t want you to touch this variable because your code will break, maybe in very unpredictable ways because private variables may have their semantics changed without warnings. Discuss with the maintainer what your use case is and try to get agreement on your use case so that it’s correctly supported by the maintained version. If there’s no agreement, fork the code base and knowingly maintain a fork, because the expectations of stability will be more realistic, and people will be more careful when merging forward from the maintainer’s code base, if that’s wanted. That’s quite different from pretending you’re using a stable version and using private data behind the scenes.
As I mentioned clearly in the post, they provide an additional layer of encouragement for people to not break the rules carelessly.
I know it’s debatable, but after a good level of experimentation, the pros and cons are somewhat clear to me. Thanks for your comments!
I quite agree with you on all points, but I wanted to add some comments about it.
It’s possible in python to enforce the rule via an external tool (like pylint) that you can launch on you code in you quality assurance checks.
But I you really want to ignore the rule (sometime we need it), I prefere a clean way to do it, with the warning of the tool still active that can remember you to clean the code when possible. If you break the rule in C++, it’s with an horrible hack (like a casting in void *), which is unclean, can lead to confusion and errors and which is harder to spot with an automatic tool.
As usual, sorry for my english, I hope it’s understandable
The Java reflection trick is about as elaborate as figuring out the __name mangling in Python. If the user of the library goes through the trouble of using that specifically private name (java or python), it really is their problem.
I totally agree with your post. Not having true privacy in Python makes it difficult to use on large scale projects. Ruby has the right idea in that everything is private by default unless you explicitly say something is public. I’m hoping we’ll be able to do something similar in Python 3, maybe using metaclasses.
I don’t think you’ve thoroughly made the case. Can you give some examples of projects that were hurt, and how?
I find myself using Enthought’s Traits for these situation. I am not always happy with this, because it is a fairly heavy dependency (it has C code), but it does solve my problem, and also brings me many good things.
Recently I was using PyRSS2Gen when I reached a limitation and had to edit a little of xml manually, for this I needed a datetime formated just like PyRSS2Gen does, which I can do by using a private function in the module’s namespace… so far the convention aspect of Python has only served me.
I’m not sure if this can be implemented easly, but maybe we can have a solution in between convention and enforcement.
Couldn’t a test framework like nose be used to check if a class is accessing private members of another one?
The check could be run upon every commit and raise a flag, send emails around or whatever if someone is messing up.
Maybe zope.interface can help in such situations
have you seen cases of people delving into one of your libraries to use a double underscore name?
If you haven’t then we are merely working on the convention that a variable not mentioned in the ‘interface’ was used by the library consumer.
If you have, then I am curious as to the justification given by the consumer for going to the trouble of un-mangling the name?
This has been discussed before, where others have said that their are ways to access private names in Java and C++ and that if someone was prepared to de-mangle in Python, they would be just as recless in other languages. I remain unconvinced.
Chris, I’ve provided enough detail of them above for a generic understanding. I don’t intend to turn the comment session here into a discussion of individual occurrences. We can discuss them in a different place if you want more details than this.
Ricardo, that’s very interesting. You basically validated my argument in the post. :-) You wanted a date formatting function, and instead of simply copying it into your own code base, you’ve used a private function which the module author doesn’t want you to trust on.
Paddy, I get the impression that you haven’t read the post carefully, nor the comments above. The argument I made is that in Python the most used way for dealing with private attributes is with convention, not with double underscores, and that’s what I’m mainly discussing above. Yes, Python can mangle variables to make them “private”, but IMO that’s clearly an afterthought with a number of edge cases in the implementation (try to getattr(self, “__attr”) as a hint), which may explain why it’s not very used. Additionally, if you compare Python to Java, you’ll quickly find that Java has other very interesting access models, such as package access, which gives more freedom to the maintainer while still restricting access elsewhere.
If Python was changed in this regard, it wouldn’t be Python. Would it? So, perhaps your point is valid, but “what’s the point?” You can just use the language that serves your purpose or “fits your brain.”
Yes, in general it is allowed to improve a programming language over time. :-) This argument makes even less sense if we take into consideration how much Python 3 changed in the name of improving things up.
You said that “I’m also a believer that when the rules don’t work for people, the rules should be changed, not the people.”
I wonder when the rule is important enough to change the people though. For instance of you development process is oriented to TDD and people don’t write the tests or do the job poorly will you change them then ?
Tim, that’s an interesting question. I’ve posted a new entry to better explain what I meant, and used your questioning as an example.
Guillaume, I totally agree with your point. Note, though, that this case falls into what I describe in the post: This methodology may work quite well in situations where the code scope is within a very controlled environment, with one or more teams which follow strictly a single development guideline, and have the power to refactor the affected code base somewhat easily when the original decisions are too limiting.
Your English is pretty good, by the way.
I prefer not to have private/protected members and methods in Python, because those things assume that libraries are perfect and library users are jerks that like to kludge.
The opposite is often the true in practice; libraries are buggy and library users like me have been forced to patch them to meet deadlines and deliverables. (And don’ tell me that “you should report the bug, fix the library and/or wait for next version” thing.)
Not having protected/private members in Python is one problem less when you need to handle buggy libraries. If such things are added to Python, libraries will still be buggy and still will have to be patched in user code — it will just be more complicated to do, like C++.
Quite simply and without trying to judge people’s personal characteristics, access control assumes that there’s value in being able to separate the public interface from the implementation. It’s a contract between the library maintainer and users of the library. In open source, you can very easily break these kinds of contracts by forking the library and doing whatever you please with it. As I mentioned in the post, this will provide you with what you want, and will reduce problems for the real maintainer since you are the maintainer of the fork.
You’re helping to prove my point. Convention doesn’t work because there’s a ton of people like you out there, which hop over it for pretty much anything, and feel that interacting with the maintainer is too much work.
The fact that you should report the bug, fix the library and/or wait for the next version is certainly a fantastic advice for someone that plans to work nicely in an open source environment, but being a good player or not is really up to you.
The fact that you’re spreading software which depends on private APIs of libraries in the wild is not just your problem, though, as I tried to explain in the post.
Just for a moment, put yourself into the position of a maintainer which has to maintain a library for tens of thousands of users. If everyone jumps over what is described as the public interface of the library, suddenly the maintainer of the library can’t offer any kind of stability guarantee anymore to users of software which depend on such a library. As a follow up consequence, distributors that bundle the library and dependent software (e.g. Linux distributions) will have trouble to assert that upgrades which feel safe to the maintainer are not actually breaking dependent software (security upgrades be damd). Of course, the fact that a lot of software has poor or no test coverage at all, and that Python is a dynamically typed language, only helps to make matters worse in this scenario.
Pingback: code-centered issue tracking? « Metaprogramming, Python and testing FTW
Hey Gustavo, i think your observations are clear but i disagree on the conclusion of introducing protected syntax. Rather would like to see better tools for communicating around code, see http://tetamap.wordpress.com/2009/05/18/code-centered-issue-tracking/ for some more discussion on this. cheers, holger
The problem isn’t means of communication. There are plenty of ways people can already communicate with maintainers. We’re talking about mass behavior here, not coding under a controlled team environment. If people quite frequently do not respect the private member convention, adding an entirely new convention on top of it which goes beyond any kind of known coding practices and expecting people to follow it in the large would be futile.
Also, even if we assume that we’d be able to cause such a major shift in behavior across the board, sending a couple of lines from a comment to the code maintainer doesn’t fix the issue by itself. Having tons of software in the wild which use the private API vs. having tons of software in the wild which use the private API and two more lines of comment doesn’t really change the picture presented in the post and detailed in the comments above.
you wrote in your original post:
“It’s way easier to simply read the code and get to the
private guts of a library than to try to get agreement
over the right way to do something, or sending a patch
with a suggested change which was carefully architected.”
This is a valid observation and there is what i’d call “cost of communication” here. It can be high even in controlled team environments even, but often is very high between very loosely or not-at-all coupled developers. And we all know, developers are lazy.
You say in your comment:
“Having tons of software in the wild which use the private API vs. having tons of software in the wild which use the private API and two more lines of comment doesn’t really change the picture presented in the post and detailed in the comments above.”.
So let’s presume we have a private keyword and i would not be able to get at the attribute i want. I’d file an issue with the developer (if i find out how) and then what? Would he be more likely to fix the issue because there is a “private” keyword? Would he rush to remove my blocking issue? Here is what i think is likely: while waiting i’d hack around in *even weirder ways* to get at what i need, maybe duplicating code, copy-pasting his class or whatever. I need to get my job done and being confronted with private/public distinctions doesn’t make my job easier.
Concluding, I see more value in easing code-centered communication between developers then shifting Python programming paradigms to use private/protected methods.
holger (wonders how formatting works on the blog here)
As I said earlier, if the public API of the library does not satisfy you, either copy the library and adapt to your needs, or work with the maintainer to improve it as wanted. In both cases, both the maintainer of the library and your users will be happier, since your software won’t break when the next stable version of the library comes out.
I never said it would make that job easier. In fact, I’ve explicitly stated that the problem is exactly that the “job” of accessing private attributes is too easy right now. What I suggest in this post is that stricter member access control makes the job easier when you have to maintain nicely working software for thousands/millions of users and hundreds/thousands of developers in an evolving environment.
Just this morning i thought of “how can i keep these methods private” … and realized i just apply this pattern:
def mymethod(self, …):
and to someone having only an instance he has no easy way to call that function which has the same abilities as a method defined in the class body, of course. To access that function one from the caller side one needs to perform an import of the module and fish the function. This is probably something that you don’t consider sufficient. But still wanted to mention it – for my purposes it works well because there really is no convenient way to import the module, in fact :)