Error java.io.FileNotFoundException makes runner crash

May 26, 2025

I'm running Kubernetes Autoscaler system to manage the number of available runners at any given moment. These k8s pods are running on a GKE cluster on Kubernetes version 1.32. The issue is very frustratring: there is a 50% chance of a runner suddenly crashing with the error:

java.lang.RuntimeException: java.io.FileNotFoundException
	at com.github.dockerjava.netty.NettyInvocationBuilder.get(NettyInvocationBuilder.java:152)
	at com.github.dockerjava.core.exec.InfoCmdExec.exec(InfoCmdExec.java:24)
	at com.github.dockerjava.core.exec.InfoCmdExec.exec(InfoCmdExec.java:14)
	at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:33)
	at com.atlassian.pipelines.runner.core.service.docker.DockerSystemServiceImpl.lambda$getDockerSystemInfo$0(DockerSystemServiceImpl.java:33)
	at io.reactivex.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:44)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleObserveOn.subscribeActual(SingleObserveOn.java:35)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:34)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleDoOnError.subscribeActual(SingleDoOnError.java:35)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:34)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.completable.CompletableFromSingle.subscribeActual(CompletableFromSingle.java:29)
	at io.reactivex.Completable.subscribe(Completable.java:2309)
	at io.reactivex.internal.operators.completable.CompletableMergeArray.subscribeActual(CompletableMergeArray.java:49)
	at io.reactivex.Completable.subscribe(Completable.java:2309)
	at io.reactivex.internal.operators.mixed.CompletableAndThenObservable.subscribeActual(CompletableAndThenObservable.java:45)
	at io.reactivex.Observable.subscribe(Observable.java:12284)
	at io.reactivex.internal.operators.observable.ObservableFlatMap.subscribeActual(ObservableFlatMap.java:55)
	at io.reactivex.Observable.subscribe(Observable.java:12284)
	at io.reactivex.internal.operators.observable.ObservableFlatMapCompletableCompletable.subscribeActual(ObservableFlatMapCompletableCompletable.java:49)
	at io.reactivex.Completable.subscribe(Completable.java:2309)
	at io.reactivex.internal.operators.completable.CompletableOnErrorComplete.subscribeActual(CompletableOnErrorComplete.java:35)
	at io.reactivex.Completable.subscribe(Completable.java:2309)
	at io.reactivex.Completable.blockingAwait(Completable.java:1226)
	at com.atlassian.pipelines.runner.core.ApplicationImpl.main(ApplicationImpl.java:59)
Caused by: java.io.FileNotFoundException
	at io.netty.channel.unix.Errors.newConnectException0(Errors.java:164)
	at io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
	at io.netty.channel.unix.Socket.connect(Socket.java:351)
	at io.netty.channel.epoll.AbstractEpollChannel.doConnect0(AbstractEpollChannel.java:778)
	at io.netty.channel.epoll.AbstractEpollChannel.doConnect(AbstractEpollChannel.java:763)
	at io.netty.channel.epoll.EpollDomainSocketChannel.doConnect(EpollDomainSocketChannel.java:88)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.connect(AbstractEpollChannel.java:602)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1289)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:655)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.connect(CombinedChannelDuplexHandler.java:495)
	at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:51)
	at io.netty.channel.CombinedChannelDuplexHandler.connect(CombinedChannelDuplexHandler.java:296)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:657)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:634)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:618)
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:927)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:264)
	at io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:264)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)

When inspecting the runner logs closely, I can see that the error raises after sending a POST request to the runner's /status endpoint to update its state.

...
[2025-05-15 07:58:22,110] Updating runner state to "ONLINE". 
[2025-05-15 07:58:22,125] [e6af3882-6, L:/10.12.1.6:50954 - R:api.atlassian.com/13.35.248.26:443] The connection observed an error, the request cannot be retried as the headers/body were sent io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer
[2025-05-15 07:58:22,131] {"traceId":"68259e064d079cd36239b377a79fd552","parentId":"6239b377a79fd552","id":"f82b2418f490544b","kind":"CLIENT","name":"PUT","timestamp":1747295902111673,"duration":19870,"localEndpoint":{"serviceName":"runner","ipv4":"10.12.1.6"},"tags":{"http.method":"PUT","http.path":"/ex/bitbucket-pipelines/rest/internal/accounts/{50e38d04-1187-4665-a670-45319c5c824c}/runners/{4c335687-68f4-53dd-9e95-bad565cae9d0}/state","error":"recvAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer"}} 
[2025-05-15 07:58:22,839] {"traceId":"68259e064d079cd36239b377a79fd552","parentId":"6239b377a79fd552","id":"9ec238192a7a5f58","kind":"CLIENT","name":"PUT","timestamp":1747295902632302,"duration":206878,"localEndpoint":{"serviceName":"runner","ipv4":"10.12.1.6"},"tags":{"http.method":"PUT","http.path":"/ex/bitbucket-pipelines/rest/internal/accounts/{50e38d04-1187-4665-a670-45319c5c824c}/runners/{4c335687-68f4-53dd-9e95-bad565cae9d0}/state"}}
...

There are no firewalls that could be blocking requests. Just in the last week I have reported 328 errors like this. There are no more than 16 runners at the same time.

Could Atlassian API rate limiting be blocking requests from runners and stopping them suddenly, causing jobs to fail?

Thanks.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Error java.io.FileNotFoundException makes runner crash

1 answer

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events