Apr 2, 2014

REST, ROA, and HATEOAS often leads to bad webservice design

This is not to say that they are bad, but I find that all too frequently the resulting API's are poorly designed due to forgetting one thing, RPC (Remote Procedure Call) is expensive. Now by RPC, I do not mean custom messaging formats such as SOAP, or XML-RPC, I mean calling a method on a remote server. Do not think that just because you are using HTTP as the message format with something like XML or JSON, that calling GET /resource, is significantly all that different from calling get_resource in a SOAP call. The frequent idempotence also does not mean that you're not actually doing RPC as often good method design server side also implies idempotence, e.g. adding an object to a Set in Java will not result in the object being added twice if you add it twice. All calls to a remote is a form of RPC. The most expensive part of RPC is creating a new connection, just how depends on the protocol. This is why web sockets, for instance, is much cheaper than repeated calls (there are other reasons and expenses too, like maintaining many connections).

I've worked with a few Resource Oriented Architecture (ROA) web services, and they each suffered from the same flawed design, an excessive number of RPC calls was required to do seemingly simple tasks. This is caused by the, misguided, belief that every single aggregate should be it's own resource and that components of the aggregate should also have it's own resource, and that those should be the only access to the underlying aggregate. In one case working with an ROA we had to do about 5 RPC calls for every single product we wanted to create, and we were bulk creating. This problem was aggravated by the lack of an idempotent PUT for most resources.

The reality is, with a good API design we could have created all, of the objects we needed with a single API call to a bulk interface. I'm talking the RESTful equivalent to a Java Collection.addAll( objs[] ). In fact if you use addAll on a Set, the result of multiple same calls is idempotent, the same object will not be added twice. It would be really easy to write this given a good ORM, and a good interface so that you could do a POST or PUT to /entities. this is a significant improvement to a design where you'd have to do a PUT or POST for every single item you wanted to create. DELETE may be the only place where I'd consider not doing a bulk request, and it is generally able to be completed asynchronously. You may of course consider limiting the number of entities acted on in a request, so if you need to create 1000 entities, it might take 10 requests doing 100 at a time, this is still better for both the client and the server than doing 1000 requests.

The choice between PUT and POST depends on whether you believe that the call to GET must return the exact same view as PUT, meaning that a PUT would delete resources not included (for a single aggregate that's probably true), or should the behavior be equivalent to addAll or replacing the reference to the collection with a new one. Remember PUT must be idempotent, this only means that subsequent calls using the exact same arguments should result in the exact same result. You may want to consider using a different URI for manipulating your entity collections in these ways.

Another problem that was encountered with a web service we encountered is it had sub resources, that had to exist prior to creating the resource we needed to create, akin to tags. Not having a idempotent put to that resource meant we were doing create on exception update. But given the simplicity of this resource it would have been even better to just allow the api to take the final object representation of that resource, instead of requiring the id, and done a lookup by name, or a create or update, under the hood. Doing this is more difficult logic wise, and impossible if there's no natural key (because you can't look it up).

You probably are asking yourself, but how do I handle errors for these things. Well, the way I see it you have three options. One requests are a transaction, so you wrap your database code with a transaction, and it either succeeds or fails, you can return a 200 on success, ensure HATEOAS, with links to any new resources in the response. Two, you could allow partial success, and return the successful objects. Three you could return a custom message envelope payload, this isn't very RESTful because it's a protocol on top of HTTP (it's more like SOAP).

I'm currently working on designing a new REST Web Service, and I've decided that no page load, or "single conceptual action" should take more than 6 API requests. This number is not arbitrary, it's the median concurrent connection amount, per host name, for consumer web browsers. Even that number is too many, but I felt that I needed to alot more than one request allowed due to some completely different actions that may need to occur on a page load.

Keep on with the Resource Oriented REST with HATEOAS, just try to think of how to minify the number of calls could by designing less granular resources

Mar 13, 2014

Matching Hex characters in a Regex

I've noticed a common problem with regular expressions and Hex Characters, so I thought I'd blog about it. The most common way to regex a UUID, or SHA1 or some other hex encoded binary value is this (and I've seen this in Perl libraries and StackOverflow answers).

[a-f0-9] or [A-F0-9]

Neither of these are correct as Hex is case insensitive and both of these regex's are. Hex is most commonly lowercase (unless you're Data::UUID), but that's an aesthetic, not a requirement. The best way to match Hex is using a POSIX character class.

[[:xdigit:]] or \x

which matches this in a more readable manner, and intent driven manner

[A-Fa-f0-9]

as a side note it's this in a regex string in Java

"\\p{XDigit}"

Feb 27, 2014

The ShareDir Problem

Some of you may have noticed a while back that converted Pod::Spell to the use of File::ShareDir::ProjectDistDir instead of keeping the wordlist in Pod::Wordlist::__DATA__. This move was made in conjunction with making Pod::Wordlist an Object, and in preparation for a time when you'll be able to specify your own wordlist file. It was also made so that non technical contributors could more easily update the wordlist without going near anything that looked like code.

So why shouldn't you put them in __DATA__? According to File::ShareDir

Quite often you want or need your Perl module (CPAN or otherwise) to have access to a large amount of read-only data that is stored on the file-system at run-time. On a linux-like system, this would be in a place such as /usr/share, however Perl runs on a wide variety of different systems, and so the use of any one location is unreliable. Perl provides a little-known method for doing this, but almost nobody is aware that it exists. As a result, module authors often go through some very strange ways to make the data available to their code.

The most common of these is to dump the data out to an enormous Perl data structure and save it into the module itself. The result are enormous multi-megabyte .pm files that chew up a lot of memory needlessly.

Another method is to put the data "file" after the __DATA__ compiler tag and limit yourself to access as a filehandle.

The problem to solve is really quite simple.

1. Write the data files to the system at install time.
 
2. Know where you put them at run-time.

Knowing where you put them at run-time is actually still a problem, because, we don't develop in the same spot that perl installs stuff. The first portion of the problem is, "my tests can't find my sharedir file". So Test::File::ShareDir, which overrides the File::ShareDir method. People say, use Test::File::ShareDir, it solves the pain, well that's not true, they're missing a different pain. What happens if you're trying to run, say bin/podspell from the git directory? oh right now it can't find the sharedir file again. In that case I could probably work around it, but it's a mild symptom of a greater problem I've encountered, people aren't deploying CPAN modules, they're deploying from git. Now I could say, "not supported", but unfortunately I'd usually have to say that to my current boss, or coworker, whomever that may be (and I tried it, didn't work). This isn't actually the root of the problem with Pod::Spell, but I guarantee it was a problem with Business::CyberSource. Mostly I feel like leaving Pod::Spell this way is helping to weed out the issues people will have with File::ShareDir::ProjectDistDir

So what do I think the solution is? There are obviously numerous "social" problems here, that I don't think can be easily solved. I'm sure that Kent Fredric, has a better grasp than I of the technical solutions. Though I have had one reoccurring idea which is apparently not tangible without significant effort. Have a searchable sharedir path, like in unix PERL5_SHAREDIR="./share:$DETECTED_DEV_DIR:$PERL5_LIB...", and try looking for the "file in the path" until you find it, then cache that location in memory so you only have to search once per run. This is probably not a good solution for various reasons, or perhaps it's certainly grossly oversimplified in how it could work.

Ultimately, there isn't a good solution right now, and I'm not sure we've actually thought of one.

Dec 1, 2013

Advent, good idea, but problematic execution

So advent is 24 days of high quality tutorials, and it's great, and ++ too all the people who make articles. But I've got a problem... it never shows up in my feed that I read in Feedly (formerly read in Google reader). This is compounded by the fact that there are many advents, each with there own yearly feed... so each year I have to poke around at the various projects to see if they're doing advent, and if so to subscribe to the feed. The solution... we just have the advents aggregated by ironman This is a really cheap hack, but would allow the advents that are being created a greater distribution than it is probably now getting. We could also just patch the various advent sofwares to provide a feed that continues eternally year after year, instead of a new feed each year which seems not so useful, and make sure we provide that link instead of the "just this year" link, in the UI. I suppose I could go fix it... but at this time I'm not sure where the source code for advent is, nor whether each advent has it's own software backing it, Perl 6 is using Wordpress which doesn't have this problem. I suppose I could add the ones I find to ironman but maybe the advent creator's don't do that for a reason, so I'd rather not step on toes.

Nov 2, 2013

Would You Miss Autoderef in 5.20? solutions in search of a problem

This is a response to Chromatics blog post Would You Miss Autoderef in 5.20?, because I can't ever get comments to work on his MT for something like a year (500, or some blogger openid incompat).

In all honesty I don't find either particularly interesting. I've too often been targeting 5.8 or 5.10 for syntax... @{ $foo } is really the most I've ever needed, @$foo is nicer, but beyond that don't need it. I can't figure out the value of either autoderef or postfix deref, neither of these seem to be solving actual pain points, I think perhaps they're a solution in search of a problem. Maybe I just need someone to point out a good use case that this stuff is solving.

Where are the things I actually need? Here's hoping that 5.20 will get method signatures, or exception handling or maybe figure out how to get given/when out of experimental, something useful.

I really do appreciate all the hard work the people who are improving core perl are doing, and it's all needed. Things like __SUB__ and my sub {} are absolutely awesome, as well as all the work on unicode, and other general improvements. Maybe lexical subs will be moved to stable? but I doubt it. Basically I want something that I can point to my friends outside of the echo chamber, something they could look at and say, yeah that's cool, Perl is moving forward.

Oct 23, 2013

Providing with Providers and Bread::Board

So when I started using Dependency Injection the following problem happened, how do I Inject this dependency when the container is not accessible at this point. Ok, that sentence even confused me a little bit, so what do I mean. Let's say I have a Repository for Products that is injected into my controller. Each Product stored has one or more ProductVariants that is part of it's aggregate, which itself has Nested Categories. Loading this entire graph at once would be relatively expensive, so we decide to do some lazy loading via DBI in the classes. One problem, how on earth do we Inject a Database Handle all the way down to Categories. Most of these ways are against DI, but they are solutions to the problem, there are also ways to combine these. Also, your model class having a database handle is probably bad design itself, but I'm not going to get into that. Sadly I've done every one of these

Manual

Well at least you aren't hard coding the way to read your config file, or your database driver. You're smart enough to rely on an Interface rather than an Implementation. This is fraught with so many problems. Firstly if your web server (assuming it's a web application) is getting any kind of traffic at all you'll end up creating tons of database connections, you'll also be reading that config file every time (ok I forget if Config::Merge caches to memory, it might, but often when I see people design this way, they are basically slurping the file every time). Someday 5 years from now, someone is going to hate you because now they need to support replicants... and the config needs to support more connection strings, which means modifying every place you've done this. Also, you've completely lost the ability to inject your dependencies for whatever reason you may want to.

Inheritance/Composition

Ok, this is a little bit better than before, at least now you have Inverted your dependencies, you could provide the config or the database handle to the class. You've also put the code in a centralized place so it's easy to change when you need to. You're still reading the file fairly often, though perhaps less because it now depends on how long Product variant is alive. So what happens if your connection is lost? We still have a connection for each class, a connection that may now be held much longer. Why does Product Variant need access to the config? this is a violation of the Law of Demeter.

Naive Service Locator

We need to get rid of knowledge of the config. We can do this by using a Service Locator, which is simply a well known service to retrieve other services, usually a global singleton. In our example we're at least smart enough to allow ourselves to change the class out via injection for testing. We no longer have tons of connections or config reads. However, we now have a new problems, what happens when our Application Server forks a process and we lose the database connection? What about when our locator gets more complex, like nested containers, that could change or access, specifically with replication. Also our class is now directly dependent on Bread::Board, and its interface. At least we've stopped caring how our database handle is built. Our locator is a global singleton, and we can't change our Container class for testing.

Robust Service Locator

Ok, so this is much better we can now configure which locator instance we use at runtime. We have removed the dependency on the Bread::board interface. There is no longer a problem with database connections being dropped. However, our container is still a global singleton, and our class still knows about it, which again, law of Demeter.

Dependency Injection and Pass it down

For now I've been basically ignoring other classes because with all of these other approaches they aren't really a concern because you would do the same thing in every class, fetch your service. Much of the code is required here anyways, we always would have to do the sql, the transforms the loops. Dependency inversion is the opposite, do not think of how to retrieve the dependency instead have the dependency provided. But this becomes tricky to think of when you're 3 or more levels deep in your hierarchy. One way to do it simply pass the reference. We create a specific problem here, our Repository lifecycle is a singleton so we need to ensure re-connection, thus we must inject the connector which means we are immediately dependent on the DBIx::Connector interface. This doesn't seem that tricky until you add more than one service, which still may not seem that bad, until you have to add one later, and oh my god, now you're modifying several classes.

Dependency Injection with Providers

This next and final sample show's one way of doing this with Providers. A little context on a Provider first, a Provider is simply an object that can be used to retrieve a an instance of an object you need. It's really just a kind of factory, but tends to be specific to dependency injection, in scenarios where you need a new instance of an object each time. It seems that it might also work well for other cases, such as objects with a longer lifespan than a new instance on every request from the injector, but shorter than a permanent singleton. In short a provider should be able to provide you with an instance on request, without requiring to to depend on retrieval.

The code that I'm demonstrating will not work currently practical scenario, meaning one where variant parameters are required. I've opened a bug about resolving the issue. In the mean time, the patch is simple and you could apply it yourself. You could use BUILDARGS to rename an alternate key to the primary hashkey, in your models. You could also just define each model service one at a time instead of looping them, and actually validating their parameters.

You may note that I've removed the config, this was simply so I could build the code out so it works in completion. It maybe advantageous not to put config processing code in the Dependency injector, but rather provide the config to Bread::Board::Declare at the constructor via required services. This way of doing things requires much more code, but is also much more flexible. Every piece of the model, even those hat could not normally be accessed by the injector, can now have it's dependencies injected to it.

Sep 2, 2013

Thinking of presenting at YAPC::NA 2014

So I'm thinking of proposing some talks for YAPC::NA Orlando, and/or maybe do some training. Here's my thought on what I could do that would be a contribution and different from other talks. For Training it might just be a combination of all of the concepts I could do as individual talks. Basically the idea is "I've learned Perl and Moo[se], now how do I build a large application".
  • UML
  • SOLID Object Oriented Design
  • Design Patterns
  • Domain Driven Design
  • Patterns of Application Architecture
  • Service Oriented Architectures, REST, ROA, RPC (including RESTful RPC and Resource Oriented RPC), and Pub/Sub
    • ORM Patterns ( Active Record / Data Mapper / Transaction Script )
    • MVC
    • Layered Architecture
    • Ports and Adapters
  • Dependency Injection ( with Bread::Board )
Let me know your thoughts.

Aug 5, 2013

Pod::Spell maintained but could use more hands

Just before my abrupt departure from my former employer, I took over maintaintership of Pod::Spell. I have started working to clean up the code, add modern and more tests, and improve the wordlist. There is much to be done on this front. More tests are needed yet, to ensure no accidental breakage. There's possibly a unicode bug lurking within Pod::Spell. More words are needed for the wordlist. Patches are welcome as I don't have all the time in the world to work on it.

Jul 6, 2013

Changing default behavior of File::chmod

File::chmod has been around for a long time, and is really stable, and really hasn't changed since 1999. It is far more user friendly than the chmod()  in core Perl. I recently used it for an interview test. It took me a few times to get right however because it's default behavior in symchmod() mode is to use the systems umask. I find this to be very confusing behavior. I actually thought it was a bug at first, and asked for comaint since it hadn't been updated in so long. Now that I realize it's intentional I'm unsure how to best proceed. On one hand I believe that the most obvious behavior (mimicking unix chmod) should be the default, on the other changing something that has been around this long... So I'm writing this blog post. What do you think? should I preserve the behavior? if not I'm aiming for a long deprecation cycle. Unfortunately because it used package variables for this setting, I haven't come up with a way to deprecate code wise that won't be annoying.

Regardless of what I do with this, there'll be a new release that has proper metadata, tests rewritten with Test::More, etc.

May 25, 2013

Moose Interface Pattern with parameter enforcement

Moose interfaces are problematic, for 2 reasons.

1. They are compile time, but runtime features such as attribute delegation could provide the interface (role ordering is the real problem here)
2. They don't ensure anything other than the method name.

I think this problem can be solved better by using around instead of requires Ordering of course still matters here as you can have multiple `around` modifiers on a method. This will throw an exception if method is missing or if the types passed in are not correct.