Mixing layers and a 1 character error

I recently spent a great deal of time resolving an issue that was make much more difficult because the software architecture had chosen to mix different layers of responsibility.

The Error

[ActionsApi] [Get] entity conformance check failed: document of type class org.apache.openwhisk.core.entity.WhiskTrigger did not match expected type class org.apache.openwhisk.core.entity.WhistActionMetaData. o.a.c.c.Controller$

Huh? What?

This error came from a Kubernetes installation of OpenWhisk. Due to a configuration error, this installation was not running serverless actions correctly. Because this was in a new environment, it was not possible to simply revert the changes I had made back to the previous version - it would not run in that configuration either, so I had to put a lot of effort into determining what had gone wrong.

The database

Serverless functions are stored in a CouchDB instance. Because this was a new install, one avenue to explore was the possibility that the action was actually stored incorrectly. In other words, perhaps the installation steps ran incorrectly and stored the action in the database in a way that generated this error.

After figuring out the mapping by looking at the access logs for CouchDB, I manually got the action out the database with

curl [database host]:5984/test_whisks/whisk.system%2Fsamples%2FhelloWorld

Bummer, the JSON that was returned had the action right there: "entityType":"action".

Perhaps, there was also a trigger with the exact same name:

curl [database host]:5984/test_whisks_all_docs

Nope, no triggers with the reported name, so for now, I conclude it's not the database.

Find the Code

In ApiUtils, I found the part of the code generating the error:

  /**
   * Waits on specified Future that returns an entity of type A from datastore. Terminates HTTP request.
   *
   * @param entity future that returns an entity of type A fetched from datastore
   * @param postProcess an optional continuation to post process the result of the
   * get and terminate the HTTP request directly
   *
   * Responses are one of (Code, Message)
   * - 200 entity A as JSON
   * - 404 Not Found
   * - 500 Internal Server Error
   */
  protected def getEntity[A <: DocumentRevisionProvider, Au >: A](entity: Future[A],
                                                                  postProcess: Option[PostProcessEntity[A]] = None)(
    implicit transid: TransactionId,
    format: RootJsonFormat[A],
    ma: Manifest[A]) = {
    onComplete(entity) {
      case Success(entity) =>
        logging.debug(this, s"[GET] entity success")
        postProcess map { _(entity) } getOrElse complete(OK, entity)
      case Failure(t: NoDocumentException) =>
        logging.debug(this, s"[GET] entity does not exist")
        terminate(NotFound)
      case Failure(t: DocumentTypeMismatchException) =>
        logging.debug(this, s"[GET] entity conformance check failed: ${t.getMessage}")
        terminate(Conflict, conformanceMessage)
      case Failure(t: ArtifactStoreException) =>
        logging.debug(this, s"[GET] entity unreadable")
        terminate(InternalServerError, t.getMessage)
      case Failure(t: Throwable) =>
        logging.error(this, s"[GET] entity failed: ${t.getMessage}")
        terminate(InternalServerError)
    }
  }

But, because it's a function, Scala will be inlining the call and there will be no ApiUtils.class file to do debugging on. I ended up adding additional debug code to the function, but the reasons why and the complications of the environment are lengthy enough to have their own post.

Spray

With the additional logging on, I got to a new error from the JSON deserialization library spray:

spray.json.DeserializationException: requirement failed: memory 0 MB below allowed threshould of 512 B.

Again, huh?

The spray library allows clients to inject parsing rules and in this case, the OpenWhisk injects a memory size test to ensure that actions don't use too much memory.

Ok, quick search the code for a 0 in a configuration file. Nope.

Line by line

Now that I knew I was looking for a size check, back to all of my environment changes to see which one I messed up.

Checking each and every configuration file change for the new environment found the change that cause the error. The sequence leading up to this error:

The new environment I was deploying into had different min and max memory sizes for Kubernetes pods. OpenWhisk tries to create a pod to run an invocation and it has to meet the namespace quota limits. So double clicked on the 128 and changed it to 256. But the editor also selected the "m" so 128m became 256, and because I intended to change that line, I missed that the m had been dropped.
256, when rounded to megabytes is 0 - this is where the zero came from.
Spray swallowed that exception without reporting it and converted it to a deserialization error
OpenWhisk caught that exception and then, tried to deserialize the action into other types and it worked for a trigger action with does not have a min/max memory setting
OpenWhisk then complained that the type was incorrect

Boundaries Broken

Lessons we can learn from this:

If you are going to mix business logic (in this case actions should have limited sizes) into a parser, you need to verify that you can at least get that error reported.
If you are going to re-throw an exception with a new type you should pass along the cause so that the logging layer has the chance to report on the nested error
If you have some parts of a code base that allows for parsing errors and retry logic and you have strict type checking elsewhere, you need to do some flow analysis to make sure you get the right error.
Finally, as always it's usually the change you made and not the system. True, in this case the reporting of the error was very difficult to track down, but it was still my own typo that caused the whole chain reaction to start.

Boundaries in Dev

Search This Blog

Mixing layers and a 1 character error

The Error

The database

Find the Code

Spray

Line by line

Boundaries Broken

Comments

Post a Comment

Popular posts from this blog

Spring Boot native builds when internet downloads are blocked made simple

Kotlin Notebook when you're blocked from Maven Central

Active vs. Passive Log4jShell remediation