Search code examples
javaspringperformance

How optimize RAM usage for huge byte arrays in java


Environment: Java 17, Spring boot 3.2.4, Apache Tomcat by default, G1 by default

In my application I deal with huge byte arrays, from 5 to 50 Mb. It is because I'm generating PDF files. And operations in code looks like:

byte[] pdfContent = generatePDFViaHttpCallOnAnotherMicroservice();
byte[] optimizedPdfContent = optimizePdf(pdfContent);
return optimizedPdfContent; //to the user from @RestController

When I analyze performance of my application, I see that PDF generation is too heavy. About 10 pdf generations in a minute will drop my pod. One of the problems which I see - huge byte[] arrays. As I understand they require huge sequential memory blocks. And memory allocation performed directly in old generation. As I understand from GC cycles, to allocate byte[] of 25Mb I need to perform GC and defragmentation of RAM in old gen.

Solutions to improve situation which I see:

  • Increase block size with -XX:HeapRegionSize is not an option. Because my app handles a lot of other requests, where request/response size is ~10Kb
  • Can I have 10% of RAM reserved in old generation just for cases of pdf generation. How does it possible to configure it?
  • Can I replace byte[] with other type of data? Goal of replacement - do not use huge block of sequential memory. And in case is HeapRegionSize is 1Mb and byte[] is 25Mb - use 25 blocks of 1Mb scattered all around the heap. I understand that it will degrade performance, but at the moment it is not as critical as GC problems. Because application just can't handle simple load
  • It would be great to allocate memory for pdfContent in new generation, because this objects do not live long, only in scope of single http request from user

Could you please offer me the best solution an recommendations how it can be implemented?


Solution

  • This is a known challenge with storing the entire contents of a large file in a byte array in program memory. Since the earliest days of programming, this has been addressed by using byte streams, which process a few bytes at a time, in order to avoid placing a burden on the program. This can be particularly important for a web service, which might serve hundreds or thousands or millions of requests.

    In Java, byte streams are represented by InputStreams and OutputStreams. Remove all usage of byte arrays in your code, and replace them with InputStreams and OutputStreams. In particular:

    • generatePDFViaHttpCallOnAnotherMicroservice should accept an OutputStream argument, and should write its content to that OutputStream. Its return type should be void.
    • optimizePdf should accept an InputStream and an OutputStream as arguments. The InputStream argument is the PDF content to be optimized—that is, the content obtained from generatePDFViaHttpCallOnAnotherMicroservice. The OutputStream is not something you create, but rather is received from Spring, when Spring invokes your own implementation of StreamingResponseBody.
    • To connect an OutputStream to an InputStream, use PipedInputStream and PipedOutputStream.

    Overall, your code might look something like this:

    PipedOutputStream generatedPDFDestination = new PipedOutputStream();
    PipedInputStream pdfContent =
        new PipedInputStream(generatedPDFDestination);
    
    CompletableFuture.runAsync(() -> 
        generatePDFViaHttpCallOnAnotherMicroservice(generatedPDFDestination));
    
    return new StreamingResponseBody() {
        @Override
        public void writeTo(OutputStream responseBody)
        throws IOException {
            optimizePdf(pdfContent, responseBody);
        }
    };
    

    You can also use a lambda for the returned StreamingResponseBody:

    PipedOutputStream generatedPDFDestination = new PipedOutputStream();
    PipedInputStream pdfContent =
        new PipedInputStream(generatedPDFDestination);
    
    CompletableFuture.runAsync(() -> 
        generatePDFViaHttpCallOnAnotherMicroservice(generatedPDFDestination));
    
    return responseBody -> optimizePdf(pdfContent, responseBody);