Environment: Java 17, Spring boot 3.2.4, Apache Tomcat by default, G1 by default
In my application I deal with huge byte arrays, from 5 to 50 Mb. It is because I'm generating PDF files. And operations in code looks like:
byte[] pdfContent = generatePDFViaHttpCallOnAnotherMicroservice();
byte[] optimizedPdfContent = optimizePdf(pdfContent);
return optimizedPdfContent; //to the user from @RestController
When I analyze performance of my application, I see that PDF generation is too heavy. About 10 pdf generations in a minute will drop my pod. One of the problems which I see - huge byte[] arrays. As I understand they require huge sequential memory blocks. And memory allocation performed directly in old generation. As I understand from GC cycles, to allocate byte[] of 25Mb I need to perform GC and defragmentation of RAM in old gen.
Solutions to improve situation which I see:
-XX:HeapRegionSize
is not an option. Because my app handles a lot of other requests, where request/response size is ~10KbpdfContent
in new generation, because this objects do not live long, only in scope of single http request from userCould you please offer me the best solution an recommendations how it can be implemented?
This is a known challenge with storing the entire contents of a large file in a byte array in program memory. Since the earliest days of programming, this has been addressed by using byte streams, which process a few bytes at a time, in order to avoid placing a burden on the program. This can be particularly important for a web service, which might serve hundreds or thousands or millions of requests.
In Java, byte streams are represented by InputStreams and OutputStreams. Remove all usage of byte arrays in your code, and replace them with InputStreams and OutputStreams. In particular:
generatePDFViaHttpCallOnAnotherMicroservice
should accept an OutputStream argument, and should write its content to that OutputStream. Its return type should be void
.optimizePdf
should accept an InputStream and an OutputStream as arguments. The InputStream argument is the PDF content to be optimized—that is, the content obtained from generatePDFViaHttpCallOnAnotherMicroservice
. The OutputStream is not something you create, but rather is received from Spring, when Spring invokes your own implementation of StreamingResponseBody.Overall, your code might look something like this:
PipedOutputStream generatedPDFDestination = new PipedOutputStream();
PipedInputStream pdfContent =
new PipedInputStream(generatedPDFDestination);
CompletableFuture.runAsync(() ->
generatePDFViaHttpCallOnAnotherMicroservice(generatedPDFDestination));
return new StreamingResponseBody() {
@Override
public void writeTo(OutputStream responseBody)
throws IOException {
optimizePdf(pdfContent, responseBody);
}
};
You can also use a lambda for the returned StreamingResponseBody:
PipedOutputStream generatedPDFDestination = new PipedOutputStream();
PipedInputStream pdfContent =
new PipedInputStream(generatedPDFDestination);
CompletableFuture.runAsync(() ->
generatePDFViaHttpCallOnAnotherMicroservice(generatedPDFDestination));
return responseBody -> optimizePdf(pdfContent, responseBody);