Always try to start the schedule at least 5 mins in advance of the first scan.
If the scheduled start time is in the future, start the schedule with
schedule=r4447hb,#1
You can schedule commands in the field system through the operator input with a command like
!2014.293.16:00:00
(or
!+5h
(for 5 hours from now). Any commands entered after this will not be carried out until the specified time. NB - this can make it look like the FS has locked up! You can remove any queued commands with the
flush
command, entered in the operator input.
If you are starting late (or re-starting for some reason), start the schedule with
schedule=r4447hb
if you try to start not very late (i.e. within five minutes of the scheduled start) you may get and get errors like; m5 -900 no scans; m5 -900 not while recording or playing; m5 -900 can't get device info. The mk5 is looking for the previous scheduled scan to check, which doesn't exist, and can't start the next scheduled scan recording. In this case start schedule with start line number
schedule=r4447hb,#24
Otherwise: do NOT specify a start line number if starting late.
Then send a start message to IVS.
Every two hours during the experiment eRemoteControl will bring up a checklist. Please run through the checklist when it appears. A description of the checklist parameters and what to look out for is available here:
Please also edit and update the Handover Notes page with any information on current issues, recent problems etc that need to be passed on to the next observer.
The system monitor provides a useful summary of the drives & other Monica parameters. You can run it on ops2, ops4 or ops5 with the command monitor_system.pl
or from the drop-down menu on the desktop “Applications → AuScope → Monitor System”. Run it once for each site.
You can launch clock displays too from the “Applications → AuScope” menu.
It's best to only have the Katherine and Yarragadee VNC sessions running when running through the checklist and to rely on eRemoteControl, the system monitor and the log monitor the rest of the time. You should use the VNC sessions to check that the autocorrelation spectra are OK though.
This web page provides a one-page summary of the webcams and weather radar.
The PC in the “lounge” (ops6) runs the same PCFS log monitor that ops4 does. To start it, double-click on the “Log monitor” icon on the desktop and then open the same log file that econtrol is writing to. Run it once for each station.
2012.297.03:58:54.01?ERROR st -998 reading SystemClock1 2012.297.03:58:54.01?ERROR st -999 TCP/IP connection was closed by remote peer 2012.297.03:58:54.01?ERROR st -5 Error return from antenna, see Mbus error.
and you'll need to re-establish the connection by typing:
antenna=open (this re-opens the connection) antenna=status (if the connection has re-established, you will see several lines of status messages)
If it still doesn't work, from a terminal
ping syshb
. If syshb is not “pingable” it might be a communication issue, contact the on call person and tell them.
2012.312.18:08:21.42?ERROR m5 -900 Probably no such disk 2012.312.18:08:21.42?ERROR m5 -904 MARK5 return code 4: error encountered (during attempt to execute)
It can be ignored. It seems to be linked to disk modules that contain less than the maximum number of disk drives (8).
2012.312.01:50:32.28#trakl# Computer time window is 270 milliseconds 2012.312.01:50:32.28?ERROR st -24 Computer time window exceeded 0.25 seconds, see value above. 2012.312.01:50:32.28#trakl# ACU Time window is 252 milliseconds 2012.312.01:50:32.28?ERROR st -26 ACU time window exceeded 0.25 seconds, see value above.
This is probably a network issue and only seems to happen when we’re running eTransfers. It can be safely ignored.
2012.314.18:02:02.16#antcn#Antenna outside nominal tracking tolerance of 0.0045 degrees, current tolerance 0.0100.
If this persists then there may be a tracking problem but if it only occurs before/after slews or in strong winds (see next note), it can probably be safely ignored.
ptol
command can be used to adjust the tolerance to a higher level but in general it's probably better to just put up with it.To reinitialise the connection to the antenna:
antenna=open
To get the antenna moving again:
antenna=operate
When the antenna is stuck, launch the VNC to the timepc and go to the antenna control window. Look for red buttons there. Set the schedule to halt in the fs input. You can turn on/ff the antenna manually from operate/standby, switch Drives on/off. If necessary, do the RESETS at the bottom.
After turning the drives on, it is recommended to wait a little while before putting the antenna in 'operate'. This step is crucial if the previous steps haven't resolved the issue.
Please see the instructions on http://auscope.phys.utas.edu.au/opswiki/doku.php?id=operations:documentation:dbbc_restart for restarting the dbbc if it dies.
Starting a schedule file with no additional arguments will start the observations according to the schedule, with the first observation beginning no earlier than 5 minutes from now. This is usually the best option. If you want to specify a particular part of the file to start in then you can do it as follows (taken from the manual):
Syntax: schedule=name,start,#lines Response: schedule/name,line Settable parameters: name Name of schedule file to be started. If no directory path is specified, /usr2/sched assumed. If no extension is specified, .snp is assumed. Any currently-executing schedule file is closed, and the new schedule file is opened. If the new file cannot be opened, there will be no schedule active. When a valid schedule is started, a cont command may be necessary. start Place in the schedule to begin executing. May be one of the following: null to start with the observation beginning no earlier than 5 minutes from now. #line for a line number in the file, should be a source command. time to start with the observation beginning no earlier than this time. time is in standard SNAP format. #lines Number of lines to execute before automatically halting. Default is the remainder of the schedule. Monitor-only parameters: line The line number to be executed next. Comments: If the schedule is started successfully, a log file having the same name as the schedule is automatically started, and the procedure file having the same name as the schedule is automatically established as the schedule procedure library. Any previously time-scheduled procedures from this library are cancelled. If a # of lines is specified, an automatic halt will be issued after execution of these lines. The schedule may then be continued using the cont command.
If you receive a persistent “rfpcn: error opening, rfpic probably not running, see above for error” report, or notice that the recording is notably behind the summary file and the becklog grows, you might want to restart Rxmon.
Log in to pcfs as root and perform the following command:
pcfsyg:~# su pcfsyg:~# /etc/init.d/Monica.Rxmon stop pcfsyg:~# ps -ef | grep Rxmon pcfsyg:~# /etc/init.d/Monica.Rxmon start
If the command worked, you will see the parameters listing.
The “ /etc/init.d/Monica.Rxmon start” command may not work as “ERROR on binding: Address already in use”. Just wait a minute and repeat an attempt. If it still doesn't work restart Monica:
pcfsyg:~# /etc/init.d/Monica.monica stop pcfsyg:~# /etc/init.d/Monica.monica start
The system monitor will close and need to be reopened.
Below is an edited description of the problem and how to fix it. The log monitor software should ring an alarm if it occurs but periodic checking of the scan_check output is also advised. The problem seems to occur either when there is a problem with a disk in a module (e.g. poor write speed) or when a Mark5 configuration command (e.g. fmset or mk5b_mode) is sent while recording.
To: Stations with Mark 5B/5B+ Recorders From: Ed Himwich, Dan Smythe, and Rich Strand Date: 8 May 2012 Re: Mark 5B/B+ recorder ",E" errors from "scan_check" INTRODUCTION It has been recently noticed with Mark 5B/5B+ recorders that sometimes a ",E" occurs at the end of the "scan_check" response. This indicates a problem with the data format on the disk. It is not entirely understood what causes this problem, but when it occurs, the scan with the error is unusable. If it occurs all the time, corrective action is needed. We list below steps to take to deal with this problem if it occurs at your station. Please be aware that once this error occurs persistently on a module, it is not safe to record any more data on that module. Set the module aside, appropriately labeled, and in its place use an empty module that has been erased/condition in your recorder. After changing modules, you should test with the new module using the "recscan" procedure The next section gives a complete procedure for recovery. RECOVERY PROCEDURE If the ",E" error is occurring persistently, please take the following actions: (1) Halt the schedule: disk_record=off halt When there is a problem with any Mark 5 Recorder, it is rarely helpful to terminate the FS. It's not good to halt the schedule or terminate the FS while recording, you will fill the disk. The FS will try to prevent you from terminating while recording, but it won't try to stop you from halting the schedule while recording, so please try to avoid that. (2) It is now necessary to swap to a fresh, blank module. If the other Mark5 bank contains a blank module: (2.1) Select it using the command "mk5=bank_set=inc" (2.2) Ask a local to remove the module that is having the error and label it appropriately. If the other bank is empty or contains a module with data that should be kept: (2.3) Ask a local to remove the module that is having the error and label it appropriately. (2.4) Ask a local to insert an empty module in this recorder, preferably already erased/conditioned. (2.5) Please make sure the new module is the one selected for recording. If not, use "mk5=bank_set=inc" to change which is selected. (3) Once you have verified that the new module has been selected, erase it with: mk5=protect=off mk5=reset=erase (standard procedure for any fresh Mark 5 module) (4) Test the new module with: recscan (5) If the scan_check from (4) does not show the ",E" error, the problem is probably resolved. (6) After verifying, again, that the new module that you used in step (4) is selected, erase this new module: mk5=protect=off mk5=reset=erase (7) If the problem was resolved in step (5), you can rejoin the schedule at the next opportunity using the "schedule=..." command. Do not use the "CONT" command since this will attempt to observe the scans you missed since you entered "halt". You can (should) use the new module that you used for step (4) and erased in step (6) to continue the schedule. If the problem was not resolved, please contact the on-call person. The next step would be to try a complete restart of the Mark 5 with a power cycle to see if that helps. You can also try another new module. (10) If there are some scans on the "bad" module without the error, it should be sent for correlation.
If the DrivePC fails, you will probably get an error like this:
00:40:34#antcn#Error: Cannot get monitor info from antenna (8020002) 00:40:34#antcn#Network I/O Timeout occurred on read/accept
This means you will need to restart the DrivePC. To do this:
1. Open a VNC session to Newsmerd.
2. Open a terminal on Newsmerd.
3. Enter the command
rem_reboot -r rakbus sys26m
4. Ping the DrivePC to wait until it's running. When that happens, enter (into the field system)
source=disable
This creates a new socket connection from the field system to the Drive PC.
5. Then, re-enter the schedule to start it going again to make sure it goes to the right source. Be sure to check that the drives are coming on. You can check if it's moving by typing
onsource
You might get a
ERROR sc -13 setcl: formatter to FS time difference 0.5 seconds or greater
to fix this do a:
sy=run setcl offset
Note this error is likely to reappear regularly.
Note also that the error message
?ERROR sc -18 setcl: program is already running, try "run setcl" instead.
has been seen recently when the command is issued from a terminal window. The problem has not been seen when the command is entered into the oprin window. If you do get this error when entering the command into the oprin window, please tell Jim.
The time or date in the field system log output is very wrong, but all the time settings appear OK, that is; the FS time in fmset is correct, the pcfs[hb][ke][yg] date command reports the correct time etc. The problem is the field system actually uses the hardware BIOS time from the pcfs[hb][ke][yg] computer, not the operating system time. If all the times appear OK but the field system is still incorrect then you will need to fix the hardware BIOS time setting. To read the hardware time, (and the difference from the system time), as root user;
hwclock -r
The system time comes from a local GPS receiver which runs an NTP server. Check that the pcfs[hb][ke][yg] system time is indeed correct;
ntpd -nq
The offset from the first server in the list should be less than 10 ms. Then write the current system time to the hardware clock, as root user;
hwclock -w
That the hardware clock has gone wrong probably indicates a fault, such as a bad BIOS battery on the motherboard that needs replacing.
The origin of this problem is presently unknown but the FS time can get seriously out of step. To fix this, while not recording start the fmset
program from an oper@pcfshb
terminal and issue the “+” and “-” commands, then quit from fmset (ESC). Restart fmset and the FS time should now be correct. You may need to resync the mark5B pps after this procedure.
The clkoff command measures the difference in the 1 PPS (pulse per second) signal coming from the GPS with the 1PPS from the Mark5. The Mark5 1PPS has travelled through both the DBBC and Mark5 and is a good diagnostic of a timing problem in our hardware.
There are occasionally timing glitches (clock jumps) that cause the clkoff value to change. There are several possible causes:
The easiest way to check for clock stability is to compare the clkoff and maserdelay values. The difference between these two should remain stable at around 0.3 us. The Log Monitor software calculates the difference and logs it as the “Delay difference”. If this value exceeds abs(0.5) us, an alarm is sounded (by default).
The first thing to do is not panic. If the new delay remains constant and less than abs(20) us, the correlator can handle it. Re-setting the delay introduces another clock jump which makes the correlation more difficult. So the first thing to do is in the Log Monitor:
clkoff
and maserdelay
commands from e-RemoteCtrlcd /vlbobs/ivs/logs gnuplot plot 'yg_ddif.txt' with linespoints
This will plot the delay difference against day number. You can use the right mouse button in the plot window to zoom in. Sometimes a spurious data point will make the graph painfully small, this example gnuplot command
set yrange [-0.3:-0.275] replot
will put the y-axis in the ball park for you. Change the numbers to suit the current offset. The command
set xrange [*:*] replot
will put the x-axis back to the full range of the datafile if you've zoomed in with the mouse.
Every time you press “Export data” the output files are refreshed and you can replot the values in gnuplot either by typing 'replot' or by pressing the “Replot” button in the plot window. Other possible useful files to plot are yg_maser2gps.txt
, the difference between the maser and GPS 1PPS, and yg_fmout.txt
, the difference between GPS and Mark5 output 1PPS.
If the delay difference is stable you don't need to do anything.
If the delay difference is more than 20 us, or gets so large that the clkoff
or maserdelay
values lose precision, run fmset
to get the delays back to something manageable. Make sure you are not recording while running fmset! Issuing a halt
command from e-RemoteCtrl followed by disk_record=off
is usually a safe method.
If the delay difference is drifting (usually linearly), the DBBC probably needs reconfiguring. This can be done from e-RemoteCtrl as follows (again, best to halt the schedule and make sure you're not recording):
dbbc=reconf
Monitor how things are going in the DBBC VNC session. A reconfig takes about 2 minutes. When it's completed, synchronise the dbbc:
dbbc=pps_sync
Then in a terminal window on pcfs[hb|ke|yg], run fmset to get the clocks lined up.
Now resume observations with a cont
or schedule=
command.
If a reconf does not stop the clock drift, try rebooting the DBBC (using the windows start menu) and restarting the DBBC Server.
This occurs when communication with the power sensor (a USB device) in the IF rack is lost. The power sensor is required for System Temperature (Tsys) measurements. The solution is to firstly disable Tsys measurements, then cycle power to the sensor using the Internet Power Switch, then check that it's working and lastly re-enable Tsys measurements.
You can disable the Tsys measurements as follows (Hobart is used as an example here):
On pcfshb:
pfmed pfmed: pf,station pfmed: ed,systemp12
An editor will start. Comment out the command by putting a double-quote at the start of the line. It should then look like this:
"sy=/usr2/oper/systemp12rcp.sh &
Now exit the editor, and
pfmed: exit
Lastly, please make a note in the log that Tsys measurements have been disabled.
Next kill any remaining systemp12rcp.sh
or ReadPower.sh
processes running on pcfs[hb|ke|yg] (use ps -ef | grep ReadPower
to identify the process IDs). Become root with su
and issue the command
/etc/init.d/AgilentU2000 restart
It will run a series of procedures to toggle the power and then try to re-establish communications. It may take two tries to get it fully working - when it is ok, you should get a blithely cheery message to this effect, and be wished good luck. When you receive this message, wait for a break in the recording and test the power sensor by running /home/oper/systemp12rcp.sh
. All being well, there should be no timeouts although the measured power is likely to be nonsensical (there will be bogus values written into the data from the previous timeouts). If it fails with timeouts, persevere with the /etc/init.d/AgilentU2000 restart
procedure. Once you have it working, repeat the pfmed process and remove the comment from the systemp12 procedure.
Recording continues as econtrol is a front-end viewer for the field system, so don't panic :)
When you restart econtrol from the menu it may be unable to load the telescope information (the drop-down menu boxes), and the terminal from which econtrol runs produces “Can't open interface” type errors. If this happens, in the econtrol window (the green one, not the terminal) press Control+shift+e
, and then try to open one of the drop down boxes again - this time the icon in the bottom right corner should go from red through 'connecting' to green, the information will now load, and observing can continue as normal.
When econtrol is back, check that there is a green bar above the red dot, second icon to the right of the text entry field. This indicates that a log file is being recorded. If there is not a green bar on this button, press the button and specify an appropriate filename in /vlbobs/ivs/logs
. Then in the Log Monitor, choose File > Open Log File and select the new file. Make a note that there are two log files.
Recording continues as econtrol is a front-end viewer for the field system, so don't panic :)
If the econtrol program can't connect even after repeated Ctrl+Shift+e
commands, you should check to make sure that the econtrol daemon is running on the pcfs machine. Log in to the pcfs and run ps -ef | grep econtrold
. There should be two entries in the list. If it's not running then start it with /usr2/econtrol/bin/econtrold
. If it is running at first but you still can't connect, try killing the econtrold processes and restarting it.
If this still doesn't work, try killing the ercd
process as root on the pcfs and then press Ctrl+Shift+e
in econtrol.
If the All-Sky Cam goes offline at Katherine, open the timeke vncviewer and open the folder C:\thumbs. In it, there is a script called ftp_script. Double click on the shortcut to restart the script and the All-Sky Cam should run again.
If you're having trouble connecting to all computers at a remote site (i.e. Yg or Ke), the VPN connection between the site and UTAS may have been disrupted. Often this happens when VPN hardware is reset at UTAS. While the routers at the sites are setup to re-establish the connection back to the UTAS network, sometimes this can take quite a while or not work at all.
ITS have given us an account to on routers, but it can only be used when VPN connection is lost. Fortunately, there are some computers at the remote sites which connect to the outside world without going through the UTAS VPN. We can use these machines to log-in to the confused router and reset the VPN connection — assuming the physical connection to the site is still available.
To connect to the accessible computers at the site open a VNC session by running, on ops2, the command: vncviewer $PC
, where is $PC is either ke-via-cdu
or yg-via-internode
. The password is the usual. (Rather than publish the IP addresses of these computers to the www here, I've instead written them in /etc/hosts
on ops2. The above name will therefore only work on that machine.)
Now you will need to login to the UTAS router at the site via this computer. Find the PuTTY program (its icon is two connected computers). When you open it, you should see a list of “Saved Sessions”. Select '131.217.61.1
' for Ke or 192.168.1.61
for Yg, the press Load
then Open
. Now you should now be presented with a black login window for the router. The username is physics and the password is connect.
If all goes well, you should now have a shell into the router. To reset the VPN connection, type the command clear crypto ipsec client ezvpn
. You can now exit this shell and close the VNC connection.
The connection should be restored fairly quickly. While you wait, you could try pinging the pfcs at the site.