Blocking in bash scripts
Sometimes you need to make sure that no more than one instance of your bash script is running at a time. If your platform has a flock command, then this is quite simple to do:
#!/bin/bash
LOCK_FILE=/tmp/my-script.lock
LOCK_FD=9
get_lock() {
# need to use eval here for proper expansion
eval "exec $LOCK_FD>$LOCK_FILE"
flock -n $LOCK_FD
}
get_lock || exit
# ...
When using this approach, remember that all child processes inherit file handles opened by the parent process. I had a script that ran from cron. This script started ssh-agent if it wasn’t already running and executed commands via ssh on multiple servers. ssh-agent inherited a local file descriptor and as a result the script was executed only once when ssh-agent was started. To avoid this situation, you must explicitly close the lock file when invoking a command that spawns a child process. In my case, I had to do this:
#!/bin/bash
LOCK_FILE=/tmp/my-script.lock
LOCK_FD=9
SSH_KEY=/root/.ssh/id_rsa.for.ssh-agent
get_lock() {
# need to use eval here for proper expansion
eval "exec $LOCK_FD>$LOCK_FILE"
flock -n $LOCK_FD
}
get_lock || exit
socket=$(find /tmp/ssh-*/agent.* -user root 2>/dev/null || true)
if [ -z "$socket" ]; then
# need to use eval here for proper expansion
# we need to close explicitly fd of the lock file
# otherwise open fd is kept by ssh-agent and lock can't be aquired until ssh-agent exits
eval ". <(ssh-agent $LOCK_FD>&-)"
ssh-add $SSH_KEY
return
else
# ...
fi
#...
If for some reason you can’t use flock the functionality you need can be implemented using bash exclusively:
#!/bin/bash
set -u
PID_LIST=/tmp/test-get-lock.pid
get_lock() {
local pid
while true; do
while read pid; do
kill -0 $pid || continue
[ "$pid" != "$BASHPID" ] && return 1
echo $BASHPID >$PID_LIST.new && mv $PID_LIST.new $PID_LIST && return 0
done < $PID_LIST
echo $BASHPID >>$PID_LIST
done
}
if get_lock 2>/dev/null; then
sleep 1
pids="$(cat $PID_LIST)"
pid=$(echo "$pids"|head -n1)
[ "$BASHPID" != "$pid" ] && echo "pid: $BASHPID unexpected pid: $pid $pids"
echo "pid: $BASHPID get_lock success"
else
echo "pid: $BASHPID get_lock failed"
fi
Here’s how it works:
- The process IDs (pids) are in the file. We read the pids from the file and check if they match the running processes..
- pids of terminated processes are ignored
- Detection of a pid of a running process that does not match the current process means that another instance of the script is already running and we report that it cannot be executed.
- If we met a pid corresponding to the current process, we delete everything except the current pid from the file that stores the pids (mv is an atomic operation) and continue the script execution.
- If we left the pid check loop, we add the current pid to the end of the file and repeat the check. Appending to the end of the file is atomic operation.
How reliable is this solution? While debugging, I used the following command for testing:
rm -f /tmp/*.log
for x in {0000..9999}; do ./lock-test.sh >/tmp/$x.log 2>&1 & done
wait
echo "success: $(grep success /tmp/*.log|wc -l), failure: $(grep failed /tmp/*.log|wc -l), unexpected pid: $(grep unexpected /tmp/*.log|wc -l)"
The absence of unexpected pids meant that the code worked correctly. For final testing, I used the following command:
for y in {000..999}; do
echo -n " $y"
bash -c 'rm -f /tmp/*.log
for x in {0000..9999}; do ./lock-test.sh >/tmp/$x.log 2>&1 & done
wait' 2>/dev/null; grep unexpected /tmp/*.log && break
done
I ran this test on my 4-core i7 laptop, 2-core virtual machine, and 24-core server. No problems were found in any of the cases. However, I admit that my testing was not exhaustive and the proposed code may not work correctly under some circumstances. However, if you use this code to make the script run from cron work in a single instance, there will most likely be no problems.