User Tools

Site Tools


operations:monitoring_hb

This wiki is not maintained! Do not use this when setting up AuScope experiments!

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
operations:monitoring_hb [2011/06/03 00:59]
yme
operations:monitoring_hb [2014/12/17 09:53] (current)
Warren Hankey [FS time is out by several seconds]
Line 16: Line 16:
 Below is an explanation of the current list of items to check during an observation:​ Below is an explanation of the current list of items to check during an observation:​
  
-  * **HMI: Antenna drives OK**\\ Check the antenna status display in HMI on the Windows PC (to view it, type ''​vncviewer -shared 131.217.63.225:​0''​). If DRIVES STATUS, SOFT LIMITS, DEMAND LIMITING and HARD LIMITS are all green then all is well. +  * **HMI: Antenna drives OK**\\ Check the antenna status display in HMI on the Windows PC (to view it, start a VNC session via the Applications menu to timehb). If DRIVES STATUS, SOFT LIMITS, DEMAND LIMITING and HARD LIMITS are all green then all is well. 
-  * **HMI: Time OK (i.e. SNTP server OK)**\\ Check the antenna status display in HMI on the Windows PC (to view it, type ''​vncviewer -shared 131.217.63.225:​0''​). If the CURRENT TIME area reports SNTP SERVER OK, then the controller knows the time. If not, the antenna probably doesn'​t know where its pointing, there'​s a problem. +  * **HMI: Time OK (i.e. SNTP server OK)**\\ Check the antenna status display in HMI on the Windows PC (to view it, start a VNC session via the Applications menu to timehb). If the CURRENT TIME area reports SNTP SERVER OK, then the controller knows the time. If not, the antenna probably doesn'​t know where its pointing, there'​s a problem. 
-  * **Antenna on source and tracking**\\ In the oprin window, type ''​onsource''​ and check the antenna is tracking, which it should be provided a new source command hasn't recently been issued. +  * **Antenna on source and tracking**\\ In econtrol, type ''​onsource''​ and check the antenna is tracking, which it should be provided a new source command hasn't recently been issued. 
-  * **Autocorrelations OK**\\ At the end of every scan, the postob procedure will run a script to extract some Mark5 data and display the autocorrelations of the 16 channels. Good data should contain quite flat bandpasses and zero phase. See [[operations:​monitoring_autocor|autocorrelation spectra plots]] for an example and what problems to look out for. If there'​s a problem, the DBBC may need reconfiguring. +  * **Autocorrelations OK**\\ At the end of every scan, the postob procedure will run a script to extract some Mark5 data and display the autocorrelations of the 16 channels ​in the PCFS VNC session. Good data should contain quite flat bandpasses and zero phase. See [[operations:​monitoring_autocor|autocorrelation spectra plots]] for an example and what problems to look out for. If there'​s a problem, the DBBC may need reconfiguring. 
-  * **delays OK, stable and within 1us (clkoff, maserdelay)**\\ In the oprin window,, issue the commands ''​clkoff''​ and ''​maserdelay''​. These values should be within ​microsecond of each other and stable (i.e. similar results if you issue the commands again). See [[operations:​monitoring_hb#​clkoff_reading_is_drifting_or_far_from_the_maser-gps_offset|this entry]] in the common Problems ​secion ​for a remedy. +  * **delays OK, stable and within 1us (clkoff, maserdelay)**\\ In the econtrol, issue the commands ''​clkoff''​ and ''​maserdelay''​. These values should be within ​0.5 microsecond of each other and stable (i.e. similar results if you issue the commands again). The monitoring software will calculate the difference for you and should ring an alarm if the difference is not acceptable. See [[operations:​monitoring_hb#​clkoff_reading_is_drifting_or_far_from_the_maser-gps_offset|this entry]] in the common Problems ​section ​for a remedy. 
-  * **Maser status OK**\\ Check the "​Standard VCH-1005A Manager"​ display on the Windows PC (to view it, type ''​vncviewer -shared 131.217.63.225:​0''​). Green numbers are good, red are bad. Here's an example of how it should look:​\\ ​ {{:​operations:​maserok.jpg?​200|}}\\ Report any red numbers to Brett ASAP. if you see mention of '​Battery',​ the maser has lost mains power and is running on it's UPS. If so, tell Brett immediately. +  * **Maser status OK**\\ Check the "​Standard VCH-1005A Manager"​ display on the Windows PC (to view it, start a VNC session via the Applications menu to timehb). Green numbers are good, red are bad. Here's an example of how it should look:​\\ ​ {{:​operations:​maserok.jpg?​200|}}\\ Report any red numbers to Brett ASAP. if you see mention of '​Battery',​ the maser has lost mains power and is running on it's UPS. If so, tell Brett immediately. 
-  * **mk5=mode? correct**\\ Check mode with this command in the oprin window: ''​mk5=mode?''​. The result should be\\ ''/​mk5/​!mode?​ 0 : ext : 0x55555555 : 2 : 2 ;''​ for R1 and R4 experiments,​\\ ''/​mk5/​!mode?​ 0 : ext : 0x55555555 : 4 : 2 ;''​ for OHIG, APSG and CRF observations. +  * **mk5=mode? correct**\\ Check mode with this command in econtrol: ''​mk5=mode?''​. The result should be\\ ''/​mk5/​!mode?​ 0 : ext : 0x55555555 : 2 : 2 ;''​ for R1 and R4 experiments,​\\ ''/​mk5/​!mode?​ 0 : ext : 0x55555555 : 4 : 2 ;''​ for OHIG, APSG and CRF observations. 
-  * **mk5=dot? response nominal**\\ This is a check of the Mark5 decoder time. Check the time offset in the formatter with theis command in the oprin wondow:''​mk5=dot?''​. Make sure it reports a small offset as the final value, ​than ''​syncerr_eq_0''​ and that ''​FHG_on''​ or ''​FHG_off''​ depending on whether it is currently recording or not. +  * **mk5=dot? response nominal**\\ This is a check of the Mark5 decoder time. Check the time offset in the formatter with this command in econtrol:''​mk5=dot?''​. Make sure it reports a small offset ​(~<​10ms) ​as the final value, ​that ''​syncerr_eq_0''​ and that ''​FHG_on''​ or ''​FHG_off''​ depending on whether it is currently recording or not. 
-  * **disk_pos OK**\\ The command ''​disk_pos''​ in the oprin window ​should report three values - the current number of btyes recorded, bytes at start of previous scan and bytes at start of current scan. If not currently recording, the first and third values should agree.+  * **disk_pos OK**\\ The command ''​disk_pos''​ in econtrol ​should report three values - the current number of btyes recorded, bytes at start of previous scan and bytes at start of current scan. If not currently recording, the first and third values should agree. It is normal for Yarragadee ''​disk_pos''​ to lag its expected value due to regular stows for USN uplinks.
   * **Weather (wth) being logged**\\ Look through recent messages in the field system log for output from the ''​wth''​ command, which will look like this:​\\/#​wx#/​16.1,​1007.9,​58.6\\Also make a note in the log of present weather conditions (if you're at the observatory).   * **Weather (wth) being logged**\\ Look through recent messages in the field system log for output from the ''​wth''​ command, which will look like this:​\\/#​wx#/​16.1,​1007.9,​58.6\\Also make a note in the log of present weather conditions (if you're at the observatory).
   * **S-band Tsys OK (~15-17)**\\ Check recent output from a systemp12 command (don't execute it unless the Mark5 is NOT recording) to see is the S-band Tsys is within the expected range: about 15 to 17 cal units. Look for "​tsysS"​ in the log. Make a note in the log if it is outside this range. If it persists, or the values vary wildly, there may be a problem.   * **S-band Tsys OK (~15-17)**\\ Check recent output from a systemp12 command (don't execute it unless the Mark5 is NOT recording) to see is the S-band Tsys is within the expected range: about 15 to 17 cal units. Look for "​tsysS"​ in the log. Make a note in the log if it is outside this range. If it persists, or the values vary wildly, there may be a problem.
   * **X-band Tsys OK (~5-7)**\\ Check recent output from a systemp12 command (don't execute it unless the Mark5 is NOT recording) to see is the X-band Tsys is within the expected range: about 5 to 7 cal units. Look for "​tsysS"​ in the log. Make a note in the log if it is outside this range. If it persists, or the values vary wildly, there may be a problem.   * **X-band Tsys OK (~5-7)**\\ Check recent output from a systemp12 command (don't execute it unless the Mark5 is NOT recording) to see is the X-band Tsys is within the expected range: about 5 to 7 cal units. Look for "​tsysS"​ in the log. Make a note in the log if it is outside this range. If it persists, or the values vary wildly, there may be a problem.
-  * **Any problems or concerns logged**\\ If there are any other issues or unusual ​behavior, report it in the log +  * **Any problems or concerns logged**\\ If there are any other issues or unusual ​behaviour, report it in the log by typing a comment preceeded by double quotes in econtrol 
-  * **Field System time (monit2) agrees with station time**\\ Compare the clock shown in the monit2 display with the station clock (if you're at the observatory) or with the TAC32 GPS clock in the Tac32Plus display on the Windows PC (to view it, type ''​vncviewer -shared 131.217.63.225:​0''​). The seconds should tick over together. If they don't, the clocks probably need synchronizing. To run the monit2 status monitor, enter this command at the pcfshb prompt <​code>/​usr/​bin/​xterm -name monit2 -e /​usr2/​fs/​bin/​monit2</​code>​+  * **Field System time (monit2) agrees with station time**\\ Compare the clock shown in the monit2 display ​in the PCFS VNC session ​with the station clock (if you're at the observatory) or with the TAC32 GPS clock in the Tac32Plus display on the Windows PC (to view it, start a VNC session via the Applications menu to timehb). The seconds should tick over together. If they don't, the clocks probably need synchronizing. To run the monit2 status monitor, enter this command at the pcfshb prompt ​in the VNC session ​<​code>/​usr/​bin/​xterm -name monit2 -e /​usr2/​fs/​bin/​monit2</​code>​
  
  
Line 45: Line 45:
  
 ==== FS time is out by several seconds ==== ==== FS time is out by several seconds ====
-The origin of this problem is presently unknown but the FS time can get seriously out of step. To fix this, start the fmset program from an oper@pcfshb terminal and issue the "​+"​ and "​-"​ commands, then quit from fmset (ESC). Restart fmset and the FS time should now be correct. You may need to resynch ​the mark5B pps after this procedure. ​+The origin of this problem is presently unknown but the FS time can get seriously out of step. To fix this, **while not recording** ​start the ''​fmset'' ​program from an ''​oper@pcfshb'' ​terminal and issue the "​+"​ and "​-"​ commands, then quit from fmset (ESC). Restart fmset and the FS time should now be correct. You may need to resync ​the mark5B pps after this procedure.  
 + 
 +Be sure to check that FHG=off. ​ Sometimes if there is a power glitch while the Mark5 is still recording, it can get '​stuck'​ in record mode.  This will need to be stopped with disk_record=off,​ then run fmset again.
  
 ==== clkoff reading is drifting or far from the maser-GPS offset ==== ==== clkoff reading is drifting or far from the maser-GPS offset ====
-This usually is caused by the DBBC. First, go around ​the back of rack 14 and move the cable from the DOTMON output ​of the Mark5B ​to the "1 PPS Mon" output ​of the DBBC (left hand side, sixth SMA from the top). If the same offset ​is seen on the counterthen the problem ​in in the DBBCA temporary fix can be achieved with ''​pps_sync'' ​in DBBC Control but this did not reliably fix the problem on 13/10/10Insteadtry reconfiguring ​the DBBC with ''​reconf''​ - this will take ~two minutes ​in total and you will need to re-issue the ''​dbbcifa=...'' ​commands, and resynch ​the mark5B ​with ''​fmset''​.+The clkoff command measures ​the difference in the 1 PPS (pulse per second) signal coming from the GPS with the 1PPS from the Mark5. The Mark5 1PPS has travelled through both the DBBC and Mark5 and is a good diagnostic ​of a timing problem in our hardware.  
 + 
 +There are occasionally timing glitches (clock jumps) that cause the clkoff value to change. There are several possible causes: 
 +  - Spurious signals on the 1 PPS signal. For example at Yarragadee we sometimes see a clock jump when the antenna drives are powered on. We also sometimes see it as a result ​of poor earthing or a bad connection in the cable between the DBBC or Mark5 
 +  - DBBC problem. Sometimes ​the DBBC (which uses the 1PPS from the maser and passes it's timing on to the Mark5can become unstable and the 1PPS signal will start to drift.  
 + 
 +The easiest way to check for clock stability is to compare the clkoff and maserdelay values. The difference between these two should remain stable at around 0.3 us. The Log Monitor software calculates the difference and logs it as the "Delay difference"​. If this value exceeds abs(0.5) us, an alarm is sounded (by default). 
 + 
 +=== So what do I do if there'​s a clock jump? === 
 +The first thing to do is not panic. If the new delay remains constant and less than abs(20) us, the correlator can handle it. Re-setting the delay introduces another clock jump which makes the correlation more difficult. So the first thing to do is in the Log Monitor: 
 +  - Press "​Acknowledge alarm"​ 
 +  - Under the "​Configure"​ menuselect either: 
 +    - "Delay monitoring -> Audible warning"​ which will make the monitor software beep every time it sees a > abs(0.5) us offset, rather than sound the alarm, or... 
 +    - "Delay monitoring -> Silent warning"​ which will log that the offset is large but not beep or ring alarms. This should be used with caution! 
 +  - Now monitor the Delay difference and see if it has stabilised. You can do this in several ways: 
 +    - Watch the Delay difference values ​in the log monitor windowYou can get more frequent updates by issuing regular ​''​clkoff'' ​and ''​maserdelay''​ commands from e-RemoteCtrl 
 +    - Get Log Monitor to extract a history of the delay and delay difference values by pressing the "​Export Data" button. When you do this, several ascii files will be written to /vlbobs/ivs/logsThe file that will be of most interest is (e.g. for Yarragadee) /​vlbobs/​ivs/​logs/​yg_ddif.txt. You can open this file and read it's contentsor you can use a plotting program like gnuplot to plot the values. This is especially useful if you want to see if the new offset is stable or not: 
 +      - from a terminal window:<​code>​cd /​vlbobs/​ivs/​logs 
 +gnuplot 
 +plot 'yg_ddif.txt' ​u linesp 
 +</​code>​ This will plot the delay difference against day number. You can use the right mouse button ​in the plot window to zoom in. Every time you press "​Export data" the output files are refreshed ​and you can replot the values in gnuplot either by typing '​replot'​ or by pressing the "​Replot"​ button in the plot window. Other possible useful files to plot are ''​yg_maser2gps.txt'',​ the difference between the maser and GPS 1PPS, and ''​yg_fmout.txt'',​ the difference between GPS and Mark5 output 1PPS. 
 +      ​Seperate windows [0|1|2] can be opened for each station by replacing ​the final command above with:<​code>​ 
 +set terminal ​'wxt' ​2; plot '​yg_ddif.txt'​ 
 +</​code>​  
 + 
 +=== So when do I need to reconfigure the DBBC, run fmset etc? === 
 + 
 +If the delay difference is stable you don't need to do anything. 
 + 
 +If the delay difference is more than 20 us, or gets so large that the ''​clkoff''​ or ''​maserdelay''​ values lose precision, run ''​fmset''​ to get the delays back to something manageable//Make sure you are not recording while running fmset! Issuing a ''​halt''​ command from e-RemoteCtrl followed by ''​disk_record=off''​ is usually a safe method.// 
 + 
 +The first thing to do is try the command 
 +<​code>​counter</​code>​ 
 +in e-RemoteCtrl. Check to see if this worked by typing ​''​clkoff''​ and ''​maserdelay''​. If this doesn'​t fix itproceed with the steps below. 
 + 
 +If the delay difference is drifting (usually linearly), the DBBC probably needs reconfiguring. This can be done from e-RemoteCtrl as follows (again, best to halt the schedule ​and make sure you're not recording):​ 
 +<​code>​dbbc=reconf</​code>​ 
 +Monitor how things are going in the DBBC VNC session. A reconfig takes about 2 minutes. When it's completed, synchronise the dbbc: 
 +<​code>​dbbc=pps_sync</​code>​ 
 +Then in a terminal window on pcfs[hb|ke|yg],​ run fmset to get the clocks lined up. 
 + 
 +Now resume observations ​with ''​cont''​ or ''​schedule='' ​command. 
 ==== PCFS log window reports problem with ReadPower.sh ==== ==== PCFS log window reports problem with ReadPower.sh ====
  
Line 79: Line 123:
  WARNING: ONSOURCE status is SLEWING. ​  WARNING: ONSOURCE status is SLEWING. ​
  
-You will also notice that the antenna control/​monitoring GUI (called HMI) on the Windows PC will show constant azimuth position, and probably the Azimuth brakes on. You can see this displayby issuing the following command from newsmerd: +You will also notice that the antenna control/​monitoring GUI (called HMI) on the Windows PC will show constant azimuth position, and probably the Azimuth brakes on. You can see this display by starting up a VNC session to timehb.
- +
- vncviewer -shared 131.217.63.225:​0+
  
 To fix this problem, click on “Reboot System”, then either wait for the schedule to send the antenna to the next source, or look back through the schedule and re-issue the last “source=…” command. (Note the ‘onsource’ command doesn’t seem to remedy the problem at the moment. Check the snap file for the syntax of the command & the most recent usage). The “Reboot System” button is shown here: To fix this problem, click on “Reboot System”, then either wait for the schedule to send the antenna to the next source, or look back through the schedule and re-issue the last “source=…” command. (Note the ‘onsource’ command doesn’t seem to remedy the problem at the moment. Check the snap file for the syntax of the command & the most recent usage). The “Reboot System” button is shown here:
Line 88: Line 130:
  
 The above screenshot shows the antenna in a healthy state. You will see various boxes in the POWER and DRIVES STATUS areas go red when there’s a problem. The above screenshot shows the antenna in a healthy state. You will see various boxes in the POWER and DRIVES STATUS areas go red when there’s a problem.
 +
 +==== If econtrol gets closed during an observation ====
 +
 +Recording continues as econtrol is a front-end viewer for the field system, so don't panic :)
 +
 +When you restart econtrol from the menu it may be unable to load the telescope information (the drop-down menu boxes), and the terminal from which econtrol runs produces "​Can'​t open interface"​ type errors. If this happens, in the econtrol window (the green one, not the terminal) press ''​Control+shift+e'',​ and then try to open one of the drop down boxes again - this time the icon in the bottom right corner should go from red through '​connecting'​ to green, the information will now load, and observing can continue as normal.
/home/www/auscope/opswiki/data/attic/operations/monitoring_hb.1307062787.txt.gz · Last modified: 2011/10/26 06:37 (external edit)