buggy的软件要配流氓脚本

近来要被Fermi Science Tools折磨疯了。

 其中有一个无比玄妙的bug,就是自己装在64位服务器上的版本,有的时候正常,有的时候只占cpu不做事。这时就要把它kill掉,然后重新提交,或许它又正常了。真贱啊。

多方测试无果,只好写了个流氓脚本,自动监督命令的执行,每隔30秒检查一下命令是否在正确输出,如果没有就杀了它重新运行。

 auto_rerun.sh :

#!/bin/bash
# to automatically resubmit the command if it do not produce the success string within a short time
# would be useful if you are using some buggy unpredictable software:(
# or when you want to do something ridiculous
#
# usage: 
# auto_rerun.sh mycmd myarguments
#
# output of mycmd will be logged in mycmd$prcid.log, where $prcid will be some integer (processid)
# 
# example:
# ./auto_rerun.sh echo "Working...  Hi I'm working!"
#
# $ J.X. Han, 2011-09-02 17:39:32 , Durham $
# $ <a href="http://asc.2dark.org" title="http://asc.2dark.org">http://asc.2dark.org</a>                   $
 
 
#~~~~ customize this part with your needs ~~~~~~~
successstring='Working...' # some words your command will produce upon success
timeout=30                 #the wait time (seconds) to check for success
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
if [ $# -lt 1 ];then
echo please specify a cmd to run
exit
fi
 
 
echo running: $@ 
echo "will check for:" $successstring
 
cmdname=$1
cpid=$$
logfile=$cmdname$cpid.log
 
 
$@ >$logfile 2>&1 &   #run your cmd
pid=$!
 
sleep $timeout
grep $successstring $logfile >>/dev/null
 
while [ ! $? -eq 0 ]    # loop until you get your successstring
#while [ ! -s $logfile ] # loop until you have something in the logfile
do
	kill $pid
	$@ >$logfile 2>&1 &
	pid=$!
	sleep  $timeout
	grep $successstring $logfile >>/dev/null
done
 
wait $pid  
 
echo "Command: $@ " >>$logfile