May 3, 2015

Abandoning all Perl modules

As of today I have decided to remove myself as maintainer/comaintainer of all my Perl modules. Feel free to adopt them.

Jan 7, 2015

Premature optimization is not evil

Or rather people should stop saying this because most of the people that say it don't actually seem to actually know what is meant by "Premature Optimization" or how to determine when it is evil. I've heard people say premature optimization is evil to asking. "Is there a 3rd party library that does this more efficiently?" (knowing if there are better options is premature optimization?), "Thinking about architecting your app for horizontal scalability is premature optimization" (it is if the design is significantly more complex, but if it's just between using REST and ensuring stateless (which is about the same complexity up front, but it'd be harder to convert later)), "wanting to do Dependency Injection is..", "making that code easier to read and simpler and thus faster", and on and on. On the other hand, no one seems to think that requiring Redis, Mongodb, and NodeJS because it's webscale is premature optimization, even if the clustering is horribly convoluted and you end up in callback hell (not saying you are, just saying). Basically, you're not asking to do the thing that everyone else is doing, is premature optimization.

So let's talk about what the hell premature optimization is. Premature spending a week making sure you can spin up infinite instances on AWS because someday you might get slashdotted. Premature optimization is writing a method in a less than clear manner because you think it's faster. Premature optimization is rewriting String.format to StringBuilder because StringBuilder is faster. Premature optimization is any time that you write code that is less readable for the sake of performance, or spend an inordinate amount of time ensuring optimizing it without benchmarking to see if it's slow.

I've spent a significant amount of time in the past few months working on optimizing code, why? because no one ever thought about optimization, it never occurred to the author, in one case, that querying the same data from a database that had been previously queried, in a loop, outside of a transaction, was inefficient. It never occurred to the author that not refetching from the database to do an on screen sort every time you sort would be inefficient. Why think about what you're writing? because premature optimization is evil, or at least that's what I'd be told.

Here's are examples of premature optimization that are not evil. I choose to use EnumMap when I'm storing Enum's in a Map, I presume it's more efficient, so I do it, increase in code? a class name to the constructor. StringBuilder is faster than StringBuffer, so when I come across StringBuffer I convert it, increase in code, none. I use dependency injection to wire stateless (or unchanging state) singletons so I'm not constantly creating instances, code increase is use of a DI framework. I use onClick handlers to ensure that things happen lazily, only when needed.

Basically what I'm saying is that "Premature optimization is evil" is sadly used anytime when anyone is even thinking about anything that could remotely be considered optimization. I personally optimize my code for paradigmn/pattern matching the problem first (which leads to 2 and 3), readability second, performance last. Making smarter decisions about how to write your code is not premature optimization.

I think the real "evil" is encouraging people not to think about performance, or to further understand their craft.

Oct 7, 2014

10 ways of implementing Polymorphism

Firstly what is Polymorphism and why is it so important? Polymorphism is the ability to have a many implementations of a behavior that conform to a single interface. Put in perhaps slightly better, pragmatic terms, you have one implementations of a caller, that can operate on many implementations of a "parameter", without conditionals, or changing the callers code. For instance the following, pseudo?, Perl 6-ism method handler( $obj ) { $obj.execute() }. As you can imagine $obj can be anything that has an execute method. For this Article I'll give you two implementations and one caller, in either Perl 5/6 or Java 7/8, boilerplate will be excluded for brevity.

Inheritance

Single Inheritance

Single inheritance is the most simple and well understood form of Polymorphism.

Multiple Inheritance

Multiple inheritance is often considered dangerous, is unavailable in Java and suffers from the The diamond problem. You should really only use this with a C3 MRO.

Flat Composition

Interfaces

Interfaces are probably the third most common form of Polymorhism, they are essentially codified contracts.

Traits

These are just the same as Interfaces in Java 8 you say? well yes, that's what Java 8 calls them, Traits are a list of methods flattened into a class, but they cannot access state. This basically describes what Java 8 is doing, as you can't access properties from within the interface, well.. at least not unless you do what I show here, which is basically access state through getters and setters.

Mixins

Mixins are basically traits that can access state, though some mixins (AFAIK Ruby) are implemented sneakily as multiple inheritance, rather than flat list composition. IMHO, Mixins should be implemented using flat list composition.

Typeless

Duck Typing

The has $!log in the Mixin is actually a pretty good example of duck typing, we don't check for debug we are just calling it. Java is basically incapable of doing this, except, you can treat everything as an Object (if that's all you need).

Function References

references to functions may or may not be allowed to have varied signatures depending on the language, but so long as they have the same signature they are interchangeable, and thus polymorphic. So why aren't normal functions (procedures), for example, Polymormphic, the problem with procedures is that you have to import the implementation from outside the file, where with polymorphic code, you can create your instance outside the file, pass it into code that's in the file, without changing the code, pass in a different implementation, and it'll continue to work. To modify procedural code, you'd have to modify at least the import, and in compiled code that means a rebuild. It's worth noting these aren't so much typeless as their is only one type to be concerned with, a function.

Miscellaneous

I'm personally skeptical of whether these actually fit the definition of Polymorphism, but they sort of do, just in completely different ways from the above

Method Overloading

Method overloading is called ad hoc polymorphism and is kind of weird in that what it's really doing is hiding the type change from the programmer. Reality is you're kind of asking for different behavior, but you want to hide that it's different in the caller. However since it means you wouldn't have to change the caller, it counts.

Generics

I describe generics as class templates, because they remind me of having an HTML template, and then filling in the blanks by passing in variables, the variable happens to be a Type. Perl doesn't have Generics, and I'm not aware of plans for it in Perl 6.

Reflection

Reflection is sort of polymorphic in that you can essentially treat all objects the same, via a single standard API. I don't know that I want to show the kind of Reflective code because it get's real complicated fast, but for example, @Inject can be annotated in systems with CDI compliant injector, they will reflectivly treat all objects with this the same, and then set the annotated property.

Sep 20, 2014

Celebrity nude scandal, on security, an analogy

Though I won't say they aren't victims of a crime... What the victims did is fundamentally the equivalent of using skeleton keys in the modern day. What apple did or rather didn't do, is prevent that. Apple could have used a tool like cracklib, and said at the time of password creation, this is too short, this is not random enough, we are refusing to allow you to put this skeleton key lock on your front door. So while I think that the perp should be prosecuted to the full extend of the law, it should be like a Breaking & Entering where the door was left unlocked. Apple should be sued for not requiring secure passwords. Imagine if your lock company installed them wrong, and because of that you got broken into, they didn't do their job correctly. Would people just stand for that? No, I don't think so. Somehow physical locks are seen as easier to understand, and all this computer mumbo jumbo is hard, event though I suspect most people can't tell you why a deadbolt is a better lock. People should realize Skeleton keys are no longer secure, even if they look cool, and are easy to use, it's better to use a password manager (http://lastpass.com is what I use) with a randomly generated password for all other sites (I'd say 16 characters, though I think 12 is the current suggested). Fundamentally this setup is a deadbolt with a different key required for each door, but one keychain. You can also do multifactor, which is like a key with a chip in it that will refuse to start your car if it's the wrong chip, so making a physical copy of the key (password) isn't enough.

Sep 2, 2014

Using Spring to create a full REST API in less than 60 lines of code

Spring with Spring Data is awesome. Seriously, I've never been able to throw up a full HATEOAS REST web service this fast. To start, I'll admit my headliner lie, I'm not counting the pom.xml.


cloc .                                                                 slave-vi
       5 text files.
       5 unique files.
       2 files ignored.

http://cloc.sourceforge.net v 1.62  T=0.04 s (104.8 files/s, 3930.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Maven                            1              6              7             65
Java                             3             15              0             57
-------------------------------------------------------------------------------
SUM:                             4             21              7            122
-------------------------------------------------------------------------------

The basics of the web service is we want to be able to create tasks, like those on a todo list, for now we want the simplest tasks possible, in as little code possible. We should use UUID's so that our service can scale horizontally, so that we can easily generate known test ID's and we know that no two entities will share an id if we ever wanted to flatten things. We need to be able to perform basic CRUD on all of our entities as well as list them.

First let's create our Task. As you can see it's incredibly simple, we have our UUID identity, the uuid and uuid2 basically are telling Hibernate and H2/PostgreSQL to use UUID's. You might ask why limit description to 100 characters, well, since these are quick tasks, I might want to share them in a tweet, and this allows enough room for a url shortner plus the description. I think the rest is pretty self explanatory.

Now let's create our Repository. Well that doesn't do anything... oh but it does, and although it doesn't show it, because this application doesn't need it, there's a nifty method signature parser dsl that allows you to build queries just by writing a method signature.

Here's our Application. ... and pom for dependencies and stuff.

Here's the output of some curl commands I ran.

For a slightly more in depth tutorial you can see the official spring date rest getting started page. In the future I'll try to write about how to actually connect to PostgreSQL and set up API Authentication and Authorization

People are always telling me how verbose Java is, how much less typing their language (especially Perl is). I'd love to see a Perl app that can do all this in fewer lines of Perl (restriction, no line may be longer than 120 characters, and must be humanly readable), I personally don't think it can be done at this time (not with full HATEOAS and as many response codes), but I'm waiting for the day it can, and can be structured this simply.

Jul 1, 2014

Writing deprecation notices in perl, optionally with Moose

Sometimes you want to remove behavior from your code in a future version, here's the right way to do it.

Here's the quick of how it works, the before has to come after attributes because the methods aren't yet created. Using before also means it'll always run with your method, without actually touching your method, insuring no accidental consequences to your method. The @CARP_NOT ensures that the warning thrown doesn't show a line number in your package, or from within where Method Modifiers are actually run. warnings::warnif( 'deprecated', ensures that these warnings are only emitted if you have the deprecated category enabled. But what if people don't have warnings enabled? um... oh well? that's there problem because what if people do and they want to silence these until they can get to them. I highly suggest putting the name of the method being called and it's successor into the message so that people know how to correct their code.

If you don't want Moose, just don't use the method modifier and put warnings:warnif directly in your code. if you're using a different AOP before, modify @CARP_NOT to have the correct module.

Jun 3, 2014

Java Privacy, broken by design

It is worth prefixing that none of the following arguments apply to anything using the keyword static which makes things more procedural (or in some cases functional, than Object Oriented.

The suggestion in Java is to give the least required permission, but this, in my humble opinion, violates the Open-Closed Principle. Java has four privacy levels. Giving something the least permission required to function is fine in a Security context, privacy in programming however is simply there to discourage developers from doing stupid things. In most cases, unlike security, it only makes them difficult, not impossible. I believe that any SOLID principle should make your code more easily extensible, so while in fact Java's privacy is not in literal violation of Open-Closed, it does make extension more difficult than it otherwise should be, thus violating the spirit of the principle.

Before I continue on to how I think Java's design, and common usage, violates the Open Closed Principle, I should explain how I interpret the Principle, as my interpretation appears to be slightly different from what's on Wikipedia. The Principle as described on Wikipedia appears to be combining it with two other SOLID Principles, namely Liskov Substitution and Interface Segregation. So first let's assume that The principle stands alone, and that although it'd be bad design to not be completely SOLID, Open-Closed by itself does not require a subclass to support the same interface. Let's also assume that Not modifying the source to add features is also an unrealistic expectation. The purpose of Open-Closed is to ensure that your subclasses are not modifying the the structure or data of their child classes and that a child may easily add to, or change the behavior it got from its parent (Liskov says that it must be substitutable for its parent).

First let's talk about final, marking a class as final, means you can't extend it. This by the very definition is in violation of Open-Closed, because the class is not Open for extension. Classes such as UUID are marked final, you might ask, why would I want to extend a UUID? maybe I want to give it a toURISafeBase64 method. That wouldn't break any of the orignal behavior, and is almost as legitimately belonging as representing the UUID as hex. What if I wanted to extend a nested final class like an Iterator on a Map? I can't do that, which means I have to completely reimplement the Iterator to add simple functionality. In fact the way those are implemented I have to implement much more than just the Iterator.

It is recommended by the official Java Docs, and the community, to make member variables private unless otherwise necessary. Private variables are only accessible to the current class and nested classes, they are not visible to subclasses, in or out of the package. In my opinion this violates Open-Closed because now, if I subclass I need to reimplement all the fields, or use getters/setters. Getters and Setters for every single attribute are actually almost no better than the attribute itself, and an object that is nothing more than those is an Anemic. Now it could be argued that making subclasses call methods makes them more... impervious to change, because if you change the data structure you can preserve the methods. The problem is that most classes wouldn't use their own getters internally, and thus break this, because then extending that getter won't actually modify the class as completely as desired. Also remember that subclasses are by definition, tightly coupled, usually changes to the superclass require taking a look at the subclasses. So if you are using getters and setters to ensure extensibility and preserve internal/external interface changes, use them exclusively, meaning only they can have raw access, all constructors, and business logic methods must go through them. At that point they are the replacment for direct member access and private won't matter as much (I will probably advocate a variant of this in the next article). However if you still want to access some member data hidden by the class directly, you should ensure that your subclasses can easily do so as well. You should only make a member private if it would actually cause a bug in any subclass.

So if we go on to assume that all subclasses, even ones in a different package (because you know people using your code are going to extend things) then we should be making all members protected. This would mean that all subclasses could reuse the member variables. Of course the problem is now your data is not encapsulated in your package, once a member variable is not private, is is available to your entire package. To me this also seems like a bad idea, other classes in my package don't need to see my objects internals unless they're a subclass. So now you have to choose, make all classes easily extended? or protect people who are programming in your package from themselves. You can probably control who's modifying your package and how, and have static code analysis to check that you're not calling obj.foo only this.foo. But nothing can give you back extensibility you've taken away (outside of adding it back).

So let's look at interfaces, interfaces generally have two options, public, or protected. This is fine, but has a problem, protected interfaces are only applicable to the package that has the interface defined. Methods implementing the interface must have the same privacy level. Most of the time what I actually want is an interface which I've defined globally as a contract, but I want the implementations to only be called by their package. For example, a DAO (Data Access Object) might be able to share the same interface (with judicious generic usage), between entities. However if you do this, you may find that your interface must be public, so it can be between packages, now the DAO itself must have these methods as public, even if it's being called only by something in the same package, because the interface was public so that the interface could be shared. I don't see that you can get away with this whether you use package by feature or package by layer. If you follow this through with previous design thoughts such as everything is an Interface, and those end up being public, and you want nice subclassibility, whether through protected members or through interfaced getters/setters, now everything is public, and we've completely lost any real encapsulation.

So how could it be done better? have a privacy type subclass which makes the method or member available to only subclasses and not throughout the package. Allow interfaces that have global definitions, but implementations of the methods can be at a package or subclass level. I feel like this could still be accomplished, perhaps by creating an interface type that is a "contract", and a new privacy keyword for "subclass". Contracts could define that methods be subclass, or protected, in their implementation. At that point you could have all kinds of methods that are still hidden to the general world. You could then build package by feature, have all methods that are required within the package have contracts, but share contracts between features, so all CRUD controllers would have the same method signatures, all repositories would share signatures, etc, etc.

What if I actually want more privacy? well you could not share interfaces between packages, and then have interfaces not be public. You could also not use an interface at all unless it's for a method on your bounded context that must be public. You can also say that ease of extensibility is not a goal and continue to not use your getters/setters internally, and yet make your members private.

You could also say, privacy is irrelevant, if the language is then preventing good, SOLID, design. Specifically here, Open-Closed, Liskov Substitution, and Interface Segregation. If you go this route you'll need conventions, and to trust other developers, because a lot of things will be public or protected. I recommend Perl's convention of prefixing subclass private methods with _ and assuming that all member fields are subclass/trait private and should never be called outside of their inheritance hierarchy.

May 6, 2014

Two Hundred Posts

My blog is 6 years old and 200 posts, and over 120k hits, Probably my first interesting post is when decided I was switching to git from svn, and it's not very interesting, and I think much more poorly written than I write things now. Since then I've re-skinned the blog to new templates at least twice. I now list books that I recommend on the right side of my blog, and I've ensured that all content is clearly licensed under the creative commons. Personally I've moved from being a student, to system administrator, to Perl developer, and am now building things with Java and potentially Ruby.

Given that I'm now building things in Java and Ruby their may be posts that are about those technologies and the good and bad things I've found out about them. One thing I'll say is that some of the Java as a language hate is as unfounded as the hate for Perl. All languages have good and bad things about them, even Perl 6 has warts in its design.

Since Java is my full time job now, and I have little reason to be doing Perl as I've been unhappy with the Perl 5 Framework landscape, I'm unlikely to continue developing features for my Perl 5 modules. If you're interested in becoming a comaintainer on any of my modules, my requirement is that you show interest in the module by contributing high quality patches to that module. I'd like to see evidence that I won't have to come back and fix things later, and that your interest is sincere. If you're not interested in being a comaint patches are still welcome.

I haven't found frameworks that I'm completely happy with in any other language either, at this point I'm considering making a very minor project developing a full framework for Perl 6. This framework (probably split into components) would be built on a new Dependency Injection module, using what I've learned from Bread::Board, AngularJS, and Java's CDI. It would also include an ORM that is based on Data Mapper principles and make high use of introspection. I would like to mention I have some doubt in myself making serious traction, but we'll see

Apr 2, 2014

REST, ROA, and HATEOAS often leads to bad webservice design

This is not to say that they are bad, but I find that all too frequently the resulting API's are poorly designed due to forgetting one thing, RPC (Remote Procedure Call) is expensive. Now by RPC, I do not mean custom messaging formats such as SOAP, or XML-RPC, I mean calling a method on a remote server. Do not think that just because you are using HTTP as the message format with something like XML or JSON, that calling GET /resource, is significantly all that different from calling get_resource in a SOAP call. The frequent idempotence also does not mean that you're not actually doing RPC as often good method design server side also implies idempotence, e.g. adding an object to a Set in Java will not result in the object being added twice if you add it twice. All calls to a remote is a form of RPC. The most expensive part of RPC is creating a new connection, just how depends on the protocol. This is why web sockets, for instance, is much cheaper than repeated calls (there are other reasons and expenses too, like maintaining many connections).

I've worked with a few Resource Oriented Architecture (ROA) web services, and they each suffered from the same flawed design, an excessive number of RPC calls was required to do seemingly simple tasks. This is caused by the, misguided, belief that every single aggregate should be it's own resource and that components of the aggregate should also have it's own resource, and that those should be the only access to the underlying aggregate. In one case working with an ROA we had to do about 5 RPC calls for every single product we wanted to create, and we were bulk creating. This problem was aggravated by the lack of an idempotent PUT for most resources.

The reality is, with a good API design we could have created all, of the objects we needed with a single API call to a bulk interface. I'm talking the RESTful equivalent to a Java Collection.addAll( objs[] ). In fact if you use addAll on a Set, the result of multiple same calls is idempotent, the same object will not be added twice. It would be really easy to write this given a good ORM, and a good interface so that you could do a POST or PUT to /entities. this is a significant improvement to a design where you'd have to do a PUT or POST for every single item you wanted to create. DELETE may be the only place where I'd consider not doing a bulk request, and it is generally able to be completed asynchronously. You may of course consider limiting the number of entities acted on in a request, so if you need to create 1000 entities, it might take 10 requests doing 100 at a time, this is still better for both the client and the server than doing 1000 requests.

The choice between PUT and POST depends on whether you believe that the call to GET must return the exact same view as PUT, meaning that a PUT would delete resources not included (for a single aggregate that's probably true), or should the behavior be equivalent to addAll or replacing the reference to the collection with a new one. Remember PUT must be idempotent, this only means that subsequent calls using the exact same arguments should result in the exact same result. You may want to consider using a different URI for manipulating your entity collections in these ways.

Another problem that was encountered with a web service we encountered is it had sub resources, that had to exist prior to creating the resource we needed to create, akin to tags. Not having a idempotent put to that resource meant we were doing create on exception update. But given the simplicity of this resource it would have been even better to just allow the api to take the final object representation of that resource, instead of requiring the id, and done a lookup by name, or a create or update, under the hood. Doing this is more difficult logic wise, and impossible if there's no natural key (because you can't look it up).

You probably are asking yourself, but how do I handle errors for these things. Well, the way I see it you have three options. One requests are a transaction, so you wrap your database code with a transaction, and it either succeeds or fails, you can return a 200 on success, ensure HATEOAS, with links to any new resources in the response. Two, you could allow partial success, and return the successful objects. Three you could return a custom message envelope payload, this isn't very RESTful because it's a protocol on top of HTTP (it's more like SOAP).

I'm currently working on designing a new REST Web Service, and I've decided that no page load, or "single conceptual action" should take more than 6 API requests. This number is not arbitrary, it's the median concurrent connection amount, per host name, for consumer web browsers. Even that number is too many, but I felt that I needed to alot more than one request allowed due to some completely different actions that may need to occur on a page load.

Keep on with the Resource Oriented REST with HATEOAS, just try to think of how to minify the number of calls could by designing less granular resources

Mar 13, 2014

Matching Hex characters in a Regex

I've noticed a common problem with regular expressions and Hex Characters, so I thought I'd blog about it. The most common way to regex a UUID, or SHA1 or some other hex encoded binary value is this (and I've seen this in Perl libraries and StackOverflow answers).

[a-f0-9] or [A-F0-9]

Neither of these are correct as Hex is case insensitive and both of these regex's are. Hex is most commonly lowercase (unless you're Data::UUID), but that's an aesthetic, not a requirement. The best way to match Hex is using a POSIX character class.

[[:xdigit:]] or \x

which matches this in a more readable manner, and intent driven manner

[A-Fa-f0-9]

as a side note it's this in a regex string in Java

"\\p{XDigit}"