Rafał Wrzeszcz - Wrzasq.pl

Java+AWS Lambda+Spring - a little less wrong way

Sunday, 07 May 2017, 23:00

FaaS approach is still quite fresh and developers keep comming out with better and better solutions for handling it. One of the leading cloud services providing function applications is AWS Lambda and here I will describe my exprience with it, but I'm pretty sure these clues can apply regardless of provider, which you use. Obviously when a developer face new environment, like this, the easiest way is to try it out with your known technological stack. And no matter how improper it seems to be, you can handle it with Java. And when Java, you will very likely also think of using Spring to help you with it. How? Why? Why not?

Under the hood

In order to figure out best approach, we need to understand what we are dealing with. AWS runs your function code within the container. After your function ends the container is not stopped immediately, but resides for some time and further executions of your function will use the very same container - in this case your code is already bootstrapped and invocation starts faster. However it's not guaranteed - containers won't reside there forever and also when you execute multiple invocations in parallel AWS will spawn more containers to handle them. However, container management is for you a total black-box - you should not make any assumptions regarding how your lambda is invoked.

Dependencies optimization

Of course dropping unneeded dependencies is a good practice in general, but in libraries and regular applications it's more a maintenance problem - once the dependency is loaded it doesn't usually affect your code so it doesn't generate runtime overhead. But as lambdas can be spawned over and over and bootstrap time can play a significant role, reducing lambda size is very important. I saw some lambdas that are over 70MB (sic!) and sometimes boot for 3 seconds (not counting code-level bootstrap, like Spring initialization yet)! Reduced lambda can load in ~1s with still room for improvements. So how to best prepare your lambda code and package? First of all: let's drop unneeded dependencies - lighter your package will be, faster it will load (which means faster first execution). It can also reduce memory consumption, as some dependencies bring implicit side effects. Obviously you should include only dependencies that you really use - in many build tools you can control it automatically (eg. in Maven you can use dependency:analyze-only goal).

But that's not the end - you still probably don't need a huge portion of what is anyway included in your package as transitive dependencies. As lambdas are thought to solve single problem (these are literaly functions, you should deal with them in a functional programming approach and try to solve just a single problem there), you will usually use just a single path of your libraries and that will result in a bunch of libraries not involved into your logic, useless for you. Let's take an example - let's say you use REST client with folowing methods:

public class SomeServiceClient
{
    // createdAt is Joda Time type
    public List<SomeResource> getSince(DateTime createdAt)
    {
        // …
    }

    // return type here is Spring Data Commons Page<>
    public Page<SomeResource> getSincePaged(DateTime createdAt, long page)
    {
        // …
    }

    public void remove(UUID id)
    {
        // …
    }
}

If your lambda uses only remove(java.util.UUID) method, you can exclude JodaTime and Spring Data Commons from your dependencies:

        <dependency>
            <groupId>org.some.service</groupId>
            <artifactId>some-client</artifactId>
            <version>0.0.1</version>
            <exclusions>
                <exclusion>
                    <groupId>joda-time</groupId>
                    <artifactId>joda-time</artifactId>
                </exclusion>

                <exclusion>
                    <groupId>org.springframework.data</groupId>
                    <artifactId>spring-data-commons</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Analyze your package

Still too heavy? You may try to analyze what really makes your package heavy. I've managed to reduce my lambda package from ~40MB to "just" 13MB and still shrinking it. To do that, you have to extract your package and check which library costs you extra space. JAR are just a ZIP files with some directory structure within, so you can use standard tools to work with them (of course you will have to go deeper into subdirs to see particular packages):

$ unzip your-package-standalone.jar > /dev/null && du -sh *
3,0M    ch
72K     changelog.txt
15M     com
436K    feign
4,0K    framework.properties
28K     io
88K     it
16K     license.txt
4,0K    logback.xml
924K    META-INF
16K     mime.types
3,9M    models
180K    mozilla
4,0K    notice.txt
34M     org
112K    pl
4,0K    readme.txt
2,0M    software
14M     your-package-standalone.jar

Another way to reduce size of effective JAR is to let your build tool minimize it - for example shade:shade goal for Maven provides minimizeJar option. However did not play with it yet, as it can cause unpredicted side effects for classes used dynamically, which is often case when you work with JSON. So keep in mind this is very tricky.

Using Spring in lambda

Now the most controversial part - how to use Spring in lambda? My best answer would be - simply don't. Did I mention functional programming approach? Your lambda should be small and you probably don't need all of your complex server app architecture booted for solving single problem. Using regular Spring approach for lambda brings two major problems - it requires a lot of dependencies (especially if you use Spring Boot) and it costs a lot of time to create a context (with mostly stuff not needed for your lambda). But let's go step-by-step how to (not) use Spring there. At first let's add Spring context to your any lambda:

public class MyLambdaHandler
{
    public static void handle(SNSEvent event)
    {
        ConfigurableApplicationContext springContext = new SpringApplicationBuilder(this.getClass())
            .profiles(activeProfiles)
            .web(false)
            .run();

        // your logic here - use `springContext`
    }
}

This is actually very bad - even if you want to continue with full Spring stack this already brings huge overhead of context initialization for every lambda execution. This is completely unnecessary - you can re-use the context for every consecutive execution:

public class MyLambdaHandler
{
    private static ConfigurableApplicationContext springContext;

    static {
        MyLambdaHandler.springContext = new SpringApplicationBuilder(this.getClass())
            .profiles(activeProfiles)
            .web(false)
            .run();
    }

    public static void handle(SNSEvent event)
    {
        // your logic here - use `MyLambdaHandler.springContext`
    }
}

Dropping Spring overhead

Now - why you at all think about using whole Spring stack in your lambda? Most likely you worked with Spring for years; you develop larger project with Spring and one or more lambdas are just additional elements of your system that already contains some components developed with Java that are autowiring each other, binding all of the required pieces together, providing you the logic you need in one-liner or at most few lines. But this is lambda - single purpose logic, you probably don't need Spring application events handlers; message sources; custom property sources.

Probably one of the most commonly used Spring features that you use developing your components is autowiring. You can use it in three ways - property autowiring (annotating properties), setter autowiring (annotating methods that take dependency as an argument) and constructor autowiring (annotating constructor that takes dependencies as arguments). Only if you use first type you are tightly coupled with Spring - second and third way only automates the wireing but it goes out of the way if needed. So if you develop your classes with setter/constructor autowiring (if you use property autowiring, you can usually just shift it by implementing setter methods) Spring context is not needed and if your lambda is not too large it doesn't bring much overhead to wire all of the stuff together:

public class MyLambdaHandler
{
    private static MyServiceClient client;

    static {
        ObjectMapper objectMapper = new ObjectMapper();
        // configure object mapper

        MyLambdaHandler.client = new MyServiceClient(objectMapper);
    }

    public static void handle(SNSEvent event)
    {
        // your logic here - use `MyLambdaHandler.client`
    }
}

(Re-)using @Configuration classes

Often you put your beans initialization (and wireing) into @Configuration-annotated classes (Java config style). In this case, because annotations are not required to be in class path if you don't use then, most likely you can also re-use existing classes as factories of your components directly in your code:

@Configuration
public class YourConfig
{
    @Bean
    @Lazy
    public ObjectMapper objectMapper()
    {
        ObjectMapper objectMapper = new ObjectMapper();
        objectMapper.registerModule(new JavaTimeModule());
        objectMapper.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false);
        objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
        return objectMapper;
    }
}
So even though this class is annotated to be recognized/discovered by Spring, you don't need Spring at all:
public class MyLambdaHandler
{
    private static MyServiceClient client;

    static {
        MyLambdaHandler.client = new MyServiceClient(
            // use your favourite beans source
            new YourConfig().objectMapper()
        );
    }

    public static void handle(SNSEvent event)
    {
        // your logic here - use `MyLambdaHandler.client`
    }
}

Measure your resources usage

And last point (not only related to Java and Spring) - monitor your λ performance. AWS Lambda is billed per execution time and also - per memory consumption. The second parameter is something that you explicitely define for your function and in which you need to fit. To optimize it, define it as low as possible for your case. After each execution, you can go to CloudWatch and see the logs, where it's described how much memory your execution used:

REPORT RequestId: ef41d5ef-d35c-11e6-a763-d72726f3e46e       Duration: 1276.37 ms    Billed Duration: 1300 ms Memory Size: 512 MB    Max Memory Used: 88 MB

In this line it states that function was executed with 512 MB memory assigned, but it used only 88 MB. Once you know how much your function uses - lower the memory allocation to reduce the cost. This is very important as FaaS is interesting especially because cost-saving to be billed only by exact resources your code uses.

Of course there is much more to investigate, but AWS Lambda is also for me quite fresh, so don't expect this to be a full guide. Go, read more, analyze, measure and optimize more.

Tags: Lambda, Spring, AWS, Cloud, Optimization, Java