Enhancement #3289
MultiWAN: remove static routes for checkip
Status: | CLOSED | Start date: | ||
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 100% | |
Category: | <multiple packages> | |||
Target version: | v6.7 | |||
Resolution: | NEEDINFO: | No |
Description
Actually the user is forced to choose one "ping IP" (AKA check IP) for each provider.
While it works, this setup has a drawback: the ping IP can't be reached when a link goes down, so the system need to create a static route for each check IP.
- delete the code that tries to auto-detect the right check IP
- delete the static routes
- do not use shorewall disable in the lsm down script
- adjust only "balance" routing table on link status change
Reference: http://community.nethserver.org/t/new-configuration-for-the-multi-wan-monitoring/1829
Associated revisions
conf: use checkip prop from firewall key. Refs #3289
Interface configuration: remove provider static routes. Refs #3289
Remove provider static routes. Refs #3289
wan-update event: re-add ip rule for provider. Refs #3289
WebUI: move check ip inside common options. Refs #3289
DB: remove old checkip, add new CheckIP prop. Refs #3289
Revert "lsm: avoid useless restarts" Refs #3289
This reverts commit b74e6cdc57f251b037f6e0506dc7d2fe623b8042.
conf: use new props, move event script to groups. Refs #3289
New properties:- MaxPercentPacketLoss
- MaxNumberPacketLoss
- PingInterval
- NotifyWan
spec: requires lsm >= 0.190 for groups. Refs #3289
lsm.conf: lower debug level. Refs #3289
DB: add props for link status monitor. Refs #3289
WAN update action: change logic for groups. Refs #3289
Web UI: add LSM options. Refs #3289
lsm.conf: force group status. Refs #3289
WAN notify: update mail text. Refs #3289
Web UI: improve inline help and labels. Refs #3289
Minor corrections to inline help. Refs #3289
createlinks: add static-routes-save to nethserver-firewall-base-update. Refs #3289
lsm.conf: fix missing new line. Refs #3289
multiwan: latency measure with lsm. Refs #3289
multiwan: remove latency measure with ping. Refs #3289
multiwan: remove latency measure with ping. Refs #3289
History
#1 Updated by Giacomo Sanchietti almost 6 years ago
- Status changed from NEW to TRIAGED
- Target version set to v6.7
- % Done changed from 0 to 20
#2 Updated by Giacomo Sanchietti almost 6 years ago
- Status changed from TRIAGED to ON_DEV
- Assignee set to Giacomo Sanchietti
- % Done changed from 20 to 30
#3 Updated by Giacomo Sanchietti almost 6 years ago
- Subject changed from MultiWAN: remove routes for checkip to MultiWAN: remove static routes for checkip
#4 Updated by Giacomo Sanchietti almost 6 years ago
- Status changed from ON_DEV to MODIFIED
- % Done changed from 30 to 60
- multiple check ip support (requires LSM 0.190)
- notification mail on provider status change
- configurable sensibility of LSM
#5 Updated by Giacomo Sanchietti almost 6 years ago
- Status changed from MODIFIED to ON_QA
- Assignee deleted (
Giacomo Sanchietti) - % Done changed from 60 to 70
- nethserver-base-2.9.2-1.2.gf6d0790.ns6.noarch.rpm
- nethserver-lsm-1.0.2-1.6.g3388355.ns6.noarch.rpm
- lsm-0.190-1.ns6.x86_64.rpm
- nethserver-firewall-base-ui-2.8.0-1.16.g9f1aa9e.ns6.noarch.rpm
- nethserver-firewall-base-2.8.0-1.16.g9f1aa9e.ns6.noarch.rpm
NOTE: some test cases require knowledge of advanced ip routing commands
Test case 1- Upgrade an already configured machine with at least 2 providers
- Check the checkip address is deleted from all configured providers (
db networks show
) - Check the multi wan configuration is still working
- Check that all check IPs are still reachable
- Inspect ip rules and ip routes
- After test case 1
- Try to put a provider in down state by blocking the traffic on the router (or detaching the cable)
- Check the hosts behind the firewall can still reach Internet
- Re-enable traffic for the provider
- Check the provider comes up and both links are used to access Internet
- Inspect ip rules and ip routes
- Change LSM paramters
- Check parameters are applied to LSM
- To verify the state of LSM use:
pkill -SIGUSR1 lsm
and see/var/log/messages
- Enable mail notification and set From and To fields
- Try to force a down/up state on a provider
- Check the mail is sent
#6 Updated by Adam P almost 6 years ago
- Assignee set to Adam P
#7 Updated by Adam P almost 6 years ago
- Status changed from ON_QA to TRIAGED
- Assignee deleted (
Adam P) - % Done changed from 70 to 20
System and Package Version installed
ESXi 5.1 VM - Clean install of Nethserver 6.7 fully updated - 3 eth
Packages installed: Basic firewall, Bandwidth monitor, DNS and DHCP server, Intrusion Prevention System, VPN, Web filter, Web proxy, Web server
Test Original Problem
Setup WAN1 with check IP of 8.8.4.4 and WAN2 with check IP of 4.2.2.2. Simulated a wan failure and static routes existed. Check IPs were not reachable when an interface was down.
Install Updated Package
The following commands installed all mentioned updated packages:
yum --enablerepo=nethserver-testing update nethserver-base
yum --enablerepo=nethserver-testing update nethserver-lsm
yum --enablerepo=nethserver-testing update nethserver-firewall-base-ui
Packages:
nethserver-base-2.9.2-1.2.gf6d0790.ns6.noarch.rpm
nethserver-lsm-1.0.2-1.6.g3388355.ns6.noarch.rpm
lsm-0.190-1.ns6.x86_64.rpm
nethserver-firewall-base-ui-2.8.0-1.16.g9f1aa9e.ns6.noarch.rpm
nethserver-firewall-base-2.8.0-1.16.g9f1aa9e.ns6.noarch.rpm
Test Results after update
Test case 1
*Upgrade an already configured machine with at least 2 providers
*Check the checkip address is deleted from all configured providers (db networks show
)
-Confirmed. Check IPs and field were removed from each provider
*Check the multi wan configuration is still working
-Multi wan appears to still work. WAN connectivity to W7 and Ubuntu workstations fails over after simulated outage/failure. Static routes from old multi wan configuration are still static.
*Check that all check IPs are still reachable
-All check IPs are reachable when both WANs are up. when one goes down, even after failover, they're not reachable.
*Inspect ip rules and ip routes
Test case 2
*After test case 1
*Try to put a provider in down state by blocking the traffic on the router (or detaching the cable)
-disconnected virtual nic in vsphere.
*Check the hosts behind the firewall can still reach Internet
-hosts behind NS can reach internet after about 20 seconds
*Re-enable traffic for the provider
-reconnected virtual nic in vsphere
*Check the provider comes up and both links are used to access Internet
-both links are used and eventually all traffic fails over to the higher priority nic.
*Inspect ip rules and ip routes
Test case 3
*Change LSM paramters
-changed check ip
*Check parameters are applied to LSM
-check ip was changed. appeared to apply.
*To verify the state of LSM use: pkill -SIGUSR1 lsm
and see /var/log/messages
Test case 4
*Enable mail notification and set From and To fields
-enabled email notification field and set from and to fields to email addresses in my domain
*Try to force a down/up state on a provider
-disconnected the virtual nic a couple times - no email and no logs in spam filter appliance. how does this send emails? would options for smtp settings be helpful?
*Check the mail is sent
Verified or Reopen
Reopen
Note
I also noticed that the default check ip of 8.8.8.8 is not accessible when the primary internet connection is taken offline. After reconnected all traffic fails back, communication with the check IP is possible again. I changed the check IP to 4.2.2.3 and still experienced the same behavior with 8.8.8.8 as well as with 4.2.2.3, but on WAN2. After futher testing, every IP I ping through one WAN becomes a static route through that wan connection; it's unreachable once the wan the static route went through is down. Tried 'ip ro flush cache' to no avail.
#8 Updated by Filippo Carletti almost 6 years ago
Adam P wrote:
Static routes from old multi wan configuration are still static.
Static routes should have disappeared.
Could you please try with
service network restart
to see if those routes really disappear?
ip ro
Some tests will fail if the static routes are still present. Please repeat all tests. Thank you.
how does this send emails? would options for smtp settings be helpful?
mail command is used, it should work out of the box.
A test could be:- set lsm debug (debug=10 in /etc/lsm/lsm.conf)
- restart lsm (this is the actual command to restart lsm :-))
- see verbose output in /var/log/messages (look for forkexec)
#9 Updated by Adam P almost 6 years ago
I currently only have one check ip specified: 4.2.2.3
With eth2 disconnected, I ran 'service network restart'. It took down all internet access until I rebooted the NS. After reboot, internet was back up so I ran 'ip ro' and got the following results:
ip ro
8.8.4.4 via 192.168.99.1 dev eth0
4.2.2.2 via 192.168.4.1 dev eth2
Those are the static routes that were defined before I upgraded to the test rpm. It appears they were not cleared after upgrading for some reason.
As far as mail, does the mail command resolve MX records and send mail directly? That could be an issue with SPF and PTR records not matching. That may be why the email never made it through. My spam appliance may have refused the smtp session.
Edit: I started receiving emails but found that they're only sent from the eth2 IP. I assume there's a route problem there too.
#10 Updated by Filippo Carletti almost 6 years ago
Adam P wrote:
With eth2 disconnected, I ran 'service network restart'. It took down all internet access until I rebooted the NS.
I can't explain this behavior. If you could share (even privately) your full configuration and logs I may figure what happened.
Those are the static routes that were defined before I upgraded to the test rpm. It appears they were not cleared after upgrading for some reason.
Did those static routes go away after reboot? Do you find /etc/sysconfig/network-scripts/route-*?
As far as mail, does the mail command resolve MX records and send mail directly?
mail usus the local mail system, i.e. postfix.
Edit: I started receiving emails but found that they're only sent from the eth2 IP. I assume there's a route problem there too.
I think that postfix can choose every available ip.
#11 Updated by Adam P almost 6 years ago
Filippo Carletti wrote:
Did those static routes go away after reboot? Do you find /etc/sysconfig/network-scripts/route-*?
They did not. I have rebooted NS several times. I did find the route files. one named route-eth0 and one named route-eth2. Both contain one line (the routes stated above).
I think that postfix can choose every available ip.
It didn't seem to work that way. It was only using eth0. When eth0 goes down, I did not get alert emails. When eth2 goes down, I did receive alerts.
I'll setup remote access to this NS instance and PM it to you.
#12 Updated by Filippo Carletti almost 6 years ago
Adam P wrote:
Filippo Carletti wrote:
Did those static routes go away after reboot? Do you find /etc/sysconfig/network-scripts/route-*?
They did not. I have rebooted NS several times. I did find the route files. one named route-eth0 and one named route-eth2. Both contain one line (the routes stated above).
We need a signal-event interface-update to clean the old static routes.
#13 Updated by Giacomo Sanchietti over 5 years ago
- Status changed from TRIAGED to ON_DEV
- Assignee set to Giacomo Sanchietti
- % Done changed from 20 to 30
#14 Updated by Giacomo Sanchietti over 5 years ago
Filippo Carletti wrote:
We need a signal-event interface-update to clean the old static routes.
We should need only to link the static-routes-save
action inside the nethserver-firewall-base-update event.
#15 Updated by Giacomo Sanchietti over 5 years ago
- Status changed from ON_DEV to MODIFIED
- % Done changed from 30 to 60
#16 Updated by Giacomo Sanchietti over 5 years ago
- Status changed from MODIFIED to ON_QA
- Assignee deleted (
Giacomo Sanchietti) - % Done changed from 60 to 70
- nethserver-firewall-base-2.8.0-1.18.gfc25d0e.ns6.noarch.rpm
- nethserver-firewall-base-ui-2.8.0-1.18.gfc25d0e.ns6.noarch.rpm
- nethserver-base-2.9.2-1.2.gf6d0790.ns6.noarch.rpm
nethserver-lsm-1.0.2-1.6.g3388355.ns6.noarch.rpm- nethserver-lsm-1.0.2-1.7.ga618357.ns6.noarch.rpm
- lsm-0.190-1.ns6.x86_64.rpm
Repeat the test case.
Please pay attention that nethserver-base must be installed before nethserver-firewall-base to get rid of old static routes:
yum --enablerepo=nethserver-testing update nethserver-base nethserver-firewall-base-* nethserver-lsm lsm
#17 Updated by Adam P over 5 years ago
- Assignee set to Adam P
#18 Updated by Adam P over 5 years ago
- Status changed from ON_QA to VERIFIED
- Assignee deleted (
Adam P) - % Done changed from 70 to 90
Confirmed. Static routes are gone and MultiWAN functions as expected.
#19 Updated by Giacomo Sanchietti over 5 years ago
- Status changed from VERIFIED to CLOSED
- % Done changed from 90 to 100
- nethserver-lsm-1.1.0-1.ns6.noarch.rpm
- lsm-0.190-1.ns6.x86_64.rpm
- nethserver-base-2.9.3-1.ns6.noarch.rpm
- nethserver-firewall-base-ui-2.9.0-1.ns6.noarch.rpm
- nethserver-firewall-base-2.9.0-1.ns6.noarch.rpm