Jump to content


Photo
- - - - -

IRC BNC not picking up desired hostname after data center switch


  • Please log in to reply
9 replies to this topic

#1 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 26 August 2009 - 10:10 AM

First off, this being my first post here, I'd just like to say hello to everyone. I came across binrev.com while searching for a solution to the problem I'm about to describe and it seems like this community and I are a natural fit... time will tell; on to my problem :-)

I've had my bouncer running for a few years on one server whose primary use is serving websites. The machine has 5-6 IPs allocated to it, one of which is used by the bnc for rdns. Recently, I moved into a new data center (opposite coast, still at the old data center as well). I decided to migrate my bnc to a machine there since it's geographically much closer to me, and the machine it's going on is a fail-over, so there's very few resources in use usually. I released the IP in the old data center, allocated a new IP in the new data center and set rdns again. The problem is that the bnc is adopting another IPs rdns instead of the desired one.

Here's a more spatial-like example, describing the new machines interface file (using a fake IP, assume each A record has a corresponding PTR):
172.16.0.1 hostname1
172.16.0.2 hostname1
172.16.0.3 hostname1
172.16.0.4 desiredhostname

My bnc is listening on 172.16.0.4, and tcpdump without the -n switch shows the host as desiredhostname.54321 (the latter part being the port), yet WITH -n it shows the IP as 172.16.0.1 (which is the base IP, bound to eth0). This extremely confusing to me. I think I'm just missing something obvious, but I can't figure it out (and I want to keep my hair, hence asking for help).

Nag

#2 mecca_

mecca_

    DDP Fan club member

  • Members
  • 54 posts

Posted 29 August 2009 - 03:01 PM

My bnc is listening on 172.16.0.4, and tcpdump without the -n switch shows the host as desiredhostname.54321 (the latter part being the port), yet WITH -n it shows the IP as 172.16.0.1 (which is the base IP, bound to eth0). This extremely confusing to me. I think I'm just missing something obvious, but I can't figure it out (and I want to keep my hair, hence asking for help).


In the case of your tcpdump test, it sounds like possibly some conflicting information or a typo in /etc/hosts.

Are you sure that your BNC is listening on the correct IP and not _all_ of your IPs? Are you able to correctly reverse resolve your IPs from an external machine?

#3 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 29 August 2009 - 04:58 PM

Are you sure that your BNC is listening on the correct IP and not _all_ of your IPs? Are you able to correctly reverse resolve your IPs from an external machine?


The BNC is listening on the correct IP, and only that IP. I can resolve both ways fine from an external machine.

Edited by nagyon, 29 August 2009 - 04:58 PM.


#4 mecca_

mecca_

    DDP Fan club member

  • Members
  • 54 posts

Posted 30 August 2009 - 06:04 PM


Are you sure that your BNC is listening on the correct IP and not _all_ of your IPs? Are you able to correctly reverse resolve your IPs from an external machine?

The BNC is listening on the correct IP, and only that IP. I can resolve both ways fine from an external machine.


Ok, but in your original post you said:

My bnc is listening on 172.16.0.4, and tcpdump without the -n switch shows the host as desiredhostname.54321 (the latter part being the port), yet WITH -n it shows the IP as 172.16.0.1 (which is the base IP, bound to eth0).


While I'm assuming you have the flags backwards (with -n shouldn't show a hostname, without -n should) tcpdump shows that your process is indeed listening on your primary IP. Have you verified that it's listening on the correct IP via netstat\lsof\fuser\etc.? Can you give us some output from these commands to diagnose further? (assuming their all on the same subnet, feel free to censor the first 3 octets) I would double check your BNC config files, there is probably something small you've overlooked there.

Good luck

Edited by mecca_, 30 August 2009 - 06:07 PM.


#5 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 31 August 2009 - 03:03 PM

When I wrote the first post, I was positive that that was the case, but when I replied, I double-checked that and it was not (and I don't believe I've changed anything). Maybe I'd lost my marbles by that point. Here's a few commands and their outputs:

11 is the primary IP of the machine
14 is the desired IP whose ptr record is desired as hostname

# netstat -ntp
tcp 0 0 xxx.xxx.xxx.14:32123 yyy.yyy.yyy.yyy:39980 ESTABLISHED 1234/bnc
tcp 0 0 xxx.xxx.xxx.14:53194 208.71.169.36:6667 ESTABLISHED 1234/bnc

# lsof -n | grep bnc | grep TCP (SIZE column removed)
bnc 1234 username 3u IPv4 TCP xxx.xxx.xxx.14:31213 (LISTEN)
bnc 1234 username 7u IPv4 TCP xxx.xxx.xxx.14:53194->208.71.169.36:6667 (ESTABLISHED)

# tcpdump -n -i eth0 -v dst port 6667
15:46:57.863521 IP (tos 0x0, ttl 64, id xxxxxx, offset 0, flags [DF], proto TCP (6), length 159) xxx.xxx.xxx.11.53194 > 208.71.169.36.6667: P 130:237(107) ack 3511 win 501 <nop,nop,timestamp 183607616 4529868>

# tcpdump -n -i eth0 -v dst port 53194
15:52:58.211252 IP (tos 0x0, ttl 56, id xxxxxx, offset 0, flags [DF], proto TCP (6), length 52) 208.71.169.36.6667 > xxx.xxx.xxx.11.53194: ., cksum 0xa15c (correct), ack 26 win 2896 <nop,nop,timestamp 4890272 183697652>


yyy.yyy.yyy.yyy is my IP address.

Thanks for your time and help.

#6 mecca_

mecca_

    DDP Fan club member

  • Members
  • 54 posts

Posted 01 September 2009 - 10:13 AM

# lsof -n | grep bnc | grep TCP (SIZE column removed)
bnc 1234 username 3u IPv4 TCP xxx.xxx.xxx.14:31213 (LISTEN)
bnc 1234 username 7u IPv4 TCP xxx.xxx.xxx.14:53194->208.71.169.36:6667 (ESTABLISHED)

# tcpdump -n -i eth0 -v dst port 6667
15:46:57.863521 IP (tos 0x0, ttl 64, id xxxxxx, offset 0, flags [DF], proto TCP (6), length 159) xxx.xxx.xxx.11.53194 > 208.71.169.36.6667: P 130:237(107) ack 3511 win 501 <nop,nop,timestamp 183607616 4529868>


You must have accidentally edited something out here or ran these commands at very different times. If there were indeed a connection between .11 and freenode, then you would have seen the connection via lsof. I understand that you need your privacy but cutting out all command output except for what you believe is relevant does make it hard to help :) Anyway, I still think the problem is probably something you overlooked in your BNC config file (assuming you really can reverse resolve all of these IP's externally)

#7 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 01 September 2009 - 05:54 PM

The output from those commands is verbatim, I only changed the username, process name and pid. Everything was ran within 5 minutes of each other, and even if I make a small bash script to run them as quickly as possible right after one another, the result is the same. That's why this is so bizarre and baffling to me!

My bnc config file is 15 lines long, and there's only one place to specify an IP and port, and they're both set correctly. On .14 netstat looks good, lsof looks good... but tcpdump is reporting the action happening on .11. I would consider it some sort of bug in tcpdump, but I set up another daemon on a different port, same IP (different program, just to test), and tcpdump reports activity on .14 (and the hostname resolves without -n) as expected. Conversely, I would consider it a bug with my bnc, but with the same config file and version (both compiled from source), it worked on my previous machine. Same O/S, everything.

Is there anything else I can do? Could it have anything to do with 1) a PTR existing at the other data center still (I doubt it's been removed), or 2) the other three IPs on the machine having the same rdns (mind that there separate ptr records for each, but they are the same hostname)? I'm at wits end and about ready to throw the towel in.

#8 mecca_

mecca_

    DDP Fan club member

  • Members
  • 54 posts

Posted 01 September 2009 - 11:02 PM

The output from those commands is verbatim, I only changed the username, process name and pid. Everything was ran within 5 minutes of each other, and even if I make a small bash script to run them as quickly as possible right after one another, the result is the same. That's why this is so bizarre and baffling to me!


It's verbatim, but it's not complete unless you really only have two network sockets open which seems very odd for a fail-over machine. If I'm wrong then I apologize but anyway, forget about this it's probably not the problem.

1) a PTR existing at the other data center still (I doubt it's been removed)


Is the IP address you're using now the same as the one at your previous datacenter? Are they both managed by the same company? Do they delegate control to your own DNS servers, or do they host DNS for you?

2) the other three IPs on the machine having the same rdns (mind that there separate ptr records for each, but they are the same hostname)?


I'm not sure what exactly you're trying to say here... What do you mean by three IPs having the same rnds (reverse dns) but three separate PTR records?

Let's go over a quick reverse DNS query, maybe it will help

mecca@genome:~$ dig +trace -x 74.125.53.105

; <<>> DiG 9.4.3-P2 <<>> +trace -x 74.125.53.105
;; global options:  printcmd
.                       516737  IN      NS      G.ROOT-SERVERS.NET.
.                       516737  IN      NS      C.ROOT-SERVERS.NET.
.                       516737  IN      NS      A.ROOT-SERVERS.NET.
.                       516737  IN      NS      D.ROOT-SERVERS.NET.
.                       516737  IN      NS      I.ROOT-SERVERS.NET.
.                       516737  IN      NS      F.ROOT-SERVERS.NET.
.                       516737  IN      NS      L.ROOT-SERVERS.NET.
.                       516737  IN      NS      K.ROOT-SERVERS.NET.
.                       516737  IN      NS      B.ROOT-SERVERS.NET.
.                       516737  IN      NS      H.ROOT-SERVERS.NET.
.                       516737  IN      NS      E.ROOT-SERVERS.NET.
.                       516737  IN      NS      M.ROOT-SERVERS.NET.
.                       516737  IN      NS      J.ROOT-SERVERS.NET.
;; Received 504 bytes from 68.87.69.146#53(68.87.69.146) in 9 ms

We're doing a reverse lookup of 74.125.53.105, one of google's addresses. 'dig' first gets a listing of all of the root servers.

74.in-addr.arpa.        86400   IN      NS      X.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      Y.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      Z.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      CHIA.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      DILL.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      BASIL.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      HENNA.ARIN.NET.
74.in-addr.arpa.        86400   IN      NS      INDIGO.ARIN.NET.
;; Received 199 bytes from 192.228.79.201#53(B.ROOT-SERVERS.NET) in 38 ms

From the root server (B.ROOT-SERVERS.NET), we find out who's authoritative for the 74.0.0.0/8 network. We get a listing of possible name servers (NS)

125.74.in-addr.arpa.    86400   IN      NS      NS2.GOOGLE.COM.
125.74.in-addr.arpa.    86400   IN      NS      NS4.GOOGLE.COM.
125.74.in-addr.arpa.    86400   IN      NS      NS3.GOOGLE.COM.
125.74.in-addr.arpa.    86400   IN      NS      NS1.GOOGLE.COM.
;; Received 126 bytes from 192.35.51.32#53(DILL.ARIN.NET) in 37 ms

From one of the nameservers authoritative for 74.0.0.0/8 we get a list of servers in charge of 74.125.0.0/16. We continue on, getting more specific.

105.53.125.74.in-addr.arpa. 86400 IN    PTR     pw-in-f105.google.com.
;; Received 79 bytes from 216.239.36.10#53(NS3.GOOGLE.COM) in 18 ms

From one of the nameservers (NS3.GOOGLE.COM) authoritative for 74.125.0.0/16 they answer our query. They find a PTR record for the address we looked up and it points to pw-in-f105.google.com.

This whole process is rdns. A PTR record is a single pointer to an address. You said earlier that reverse DNS is working, so if this is un-needed info I apologize, I'm just confused as to what your actual layout is. Try that dig command and make sure that the servers you are expecting to answer are indeed answering.

The whole BNC process isn't that complicated:

1. The bnc server mediates a connection between your client and IRC
2. When you connect to the IRC server, the server does a reverse lookup (see process above) for whatever IP address it sees the connection as coming from. The result of this is used as the hostmask for your IRC session.
3. ... well there is no 3, that's it. If your client is connecting from the right IP and reverse DNS is working as you expect, I have no idea what else it could be.

Edited by mecca_, 01 September 2009 - 11:02 PM.


#9 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 04 September 2009 - 03:22 AM

I understand thoroughly the process of resolving an IP to a hostname.

The IP of the new bnc is different from that of the old one; the old and new IPs belong to different data centers and different companies. I set up and maintain DNS personally, but PTR records aren't something that I'm authorized to control for obvious reasons (I don't own the data center).

I'm stumped.

#10 nagyon

nagyon

    Will I break 10 posts?

  • Members
  • 7 posts
  • Gender:Not Telling

Posted 15 September 2009 - 08:59 PM

I'm still thinking this through daily. I said that there were no differences between the machines; there is one glaring difference. The new machine runs a 32-bit kernel; the latter 64-bit. Maybe it's a limitation or bug in the bnc, the network stack, or something in between that exists only in a 64-bit env.




BinRev is hosted by the great people at Lunarpages!