Meet The Lag

Posted On May 11, 2015 By In Tutorials With 4066 Views

Tracking Down Ping Issues

We’ve all been there.  Fragging up a storm when suddenly you start jittering all over the place and can’t land a single shot.  You press tab only to find that your ping has suddenly jumped up 300ms.

Why is my ping bad?

Truth is, there’s no way for anyone to answer that for you without more information.  Let’s run through one example together and take a look at all the things it could be.

Eliminate the Obvious

Is everyone on the server lagging?  If so, it’s most likely the server or the datacenter.  If the issue persists for more than a minute, ask a head admin to look into it and we’ll check on the system health.  Or if you’re into debugging the issue for yourself, move on to the next section.

Is someone on your LAN downloading/uploading files?  Depending on your type of connection, one speed cap can affect the other.  For example, I have 105mbit down and 20mbit up through Comcast.  If I start uploading at 20mbit, that doesn’t mean I still have 105 down, or even 85 down.  I’ve seen our connection drop to the 1-2 mbit speeds while uploads are capped.  In general I like to think of caps as percentages.  For example, if I’m using 10mbit of uplink, that’s 50% of my bandwidth meaning I have about 50mbit down now.  ISPs and connection types differ (I don’t recall this issue when I was on a direct fibre link), but it’s something to keep in mind.  On slower connections, a sibling or parent using Netflix can be more than enough to ruin your gaming session.

Are you on wireless?  If so, there’s a good chance it’s the issue.  Wireless typically operates on 2.4GHz (or close to it), which is also the same range that every other wireless device in your house uses.  Heck, even your microwave uses it.  If there’s a microwave between you and your wireless access point, your connection may drop anytime someone cooks a molten hot pocket.  I’m not sure why anyone games on wireless, but after working the support desk for a while I’ve noticed this kind of thing tends to get overlooked fairly often.  But we can figure out if it’s the wireless or not in the next steps.

Check the Route

We’re going to use a command-line tool called TraceRoute to get a better idea about what’s going on.  You can use the old cmd terminal on Windows, or the newer PowerShell.  Open the start menu and run one of these terminals.

Before you can pinpoint where the latency occurs, you need to grab the IP address of the server you’re having lag issues with.  In the case of FirePowered, we only have two physical servers but 10 IPs.  If you don’t know the IP, you can find it on this page (remove the port number).  In this example, we’ll just use our webserver’s IP in Ashburn.

Type tracert in the terminal.  For example, if it was Virginia 2Fort, you would type tracert 23.235.225.106.  It should take about half a minute to complete.  If you’re on Linux, use the traceroute command similarly.  You should see something like this once it completes:

Tracert

The way TraceRoute works is by sending three ping requests to each router between your computer and the destination address.  In this way you can get an idea of where the lag is occurring.  The more you know about how the internet works as a whole, the easier it is to interpret this output.  Don’t worry though, because we’ll walk through the important bits.

Understanding Traceroute

The image above shows my connection to my router (192.168.1.1), through my ISPs various connections a few hours south of here, hops on a backbone owned by Level 3 at the first major city, hits PhoenixNAP, our host’s network in LA (they’re partnered with Level3 among others), jumps to their main datacenter in Phoenix, before taking a direct route to Ashburn, VA.  The route might seem a bit funky but there are many reasons a route might take a roundabout path.  We won’t get into the details, but let’s just assume the routers know better than we do what the fastest route is.  The notable points where my ping increases are at hop 9 and hop 13.  At hop 9 is where the connection makes the trip from Seattle to LA, and 13 is from Phoenix, AZ to Ashburn, VA.  This makes sense, and there doesn’t seem to be any problems.  A 110ms ping is pretty normal for me since I’m just a few dozen miles south of Vancouver, BC.

Now why do you need to know all this?  Two reasons: 1) Each device acts differently and 2) it’s good to know what company owns the hop where something has gone wrong. There are some hops where there’s nothing we can do.  There are going to be places where ping requests aren’t responded to at all, or aren’t responded to in a timely manner.  These tend to be high traffic areas such as backbones and edge routers.  Edge routers live on the edge of datacenters and provide the entire datacenter with internet.  Places like these tend to de-prioritize or even ignore ICMP traffic (such as pings) in effort to give better service to important traffic.  The results on your traceroute may not reflect what’s actually happening because of this behavior.  You can ignore timeouts and large latency increases if the subsequent hops don’t have issues.  If your ping appears to jump from 40ms to 200ms on one hop, but the following ones are only 50ms, just ignore the 200ms response.  If it was truly an issue all the pings beyond that point would be 200ms+.

Determining the Culprit

The goal is to find the first hop where all subsequent hops have high pings.  In many cases this is normal behavior, such as the Seattle -> LA hop, and Phoenix -> Ashburn hop in the screenshot above.  Use your best judgement.  If it’s a great distance, expect some latency.  You can’t expect your data to travel from LA to VA in a thousandth of a second.

Hop #1: Your router.  If any of the three pings are over 5ms, the issue is likely your router.  If you’re on wireless though, 5ms isn’t too uncommon.  Now 5ms may not seem like a lot, but it usually indicates a bigger problem than just latency.  You can try rebooting your router by unplugging it for ~30 seconds.  If it’s still an issue, factory reset it using the little pinhole button on the back.  If it’s not getting better, then you need a new router.  Yeah, it sucks, but it’s worth it for stable internet.

Hop #2: Your ISP’s headend.  This one can be tricky.  This will be the first hop past your modem, meaning it can be your modem that’s actually causing the issue.  The first thing I would do is restart the modem by unplugging it for about 30 seconds and plugging it back in.  If everything is better, congrats!  Maybe your modem is starting to age and needs rebooted once a month or so.  If it’s not better, it could mean a few different things.  Sometimes you need to buy a new modem (here’s the one I use; it works great with Comcast).  Other times it’s your ISP.

The reason this hop is so tricky is that everyone in your neighborhood shares a distribution node.  A node is where all the connections in your local area (street-ish) are routed.  If it’s on fibre, the node generally serves 500-2000 homes.  Usually these aren’t pingable directly.  Think of them like a network switch–they don’t have an IP address and don’t route traffic.  But each node feeds into a regional headend, which is a building within 20-30 miles of you that actually does handle the routing.  If the headend has something wrong with it, your ISP probably already knows.  It can get tricky though when it’s your distribution node having issues since we can’t see those directly.  If the second hop is where your ping is having problems, give it an hour or two, try again, and then contact your ISP if the issue is still present.  Provide the traceroute or a screenshot of it if you can.

Hop #3 and other ISP-owned hops: The third one is generally the main connection your ISP has to the entire city.  Again, contact your ISP if this or any of their hops are causing problems.  In the screenshot above, this includes hops 3-7.

Non-ISP hops: The next few are usually leased backbones from Tier-1 providers.  There aren’t a heck of a lot of companies who run these.  Any issues here will be dealt with and there’s nothing mere mortals can do to speed up the process.  You need to be a customer to talk to support for these places, and unless you’re running networks in the multi-gigabit range, odds are they won’t respond to your requests.  Here’s a network map for our datacenter as an example, and all the backbone providers they plug into.  Our datacenter can open tickets at any of these places but I doubt they would at FirePowered’s request.  It’s just an unusual thing to do.

pnap-network-map-large-web

 

Second-to-last Hop: This is probably the edge router.  Don’t expect a response from it.  Usually the latency from the hop just before it is about what you’d expect from this one.  They’re going to be in the same city and probably pretty close to one another.  If you do get a response and this one is responding slower than anticipated, blame the datacenter.  Any customers in that datacenter can open a ticket and have someone look into it.  If this is where your issue starts, report it to us and we’ll get on it.

Last Hop: The actual server.  If this is the only hop causing an issue, odds are something funky is going on with the system or it’s being attacked.  Very high system load or filled pipes due to DDoS traffic can cause networking responses to slow.  Give it a minute or two and if it’s still an issue, report it.  Although with FirePowered servers, we have bots in IRC that start freaking out if something is going wrong.  Doesn’t hurt to bring it up though.  Occasionally this can be the datacenter’s fault if a rack-level network device is malfunctioning, but I’ve never seen this happen.

TL;DR

Run a traceroute.  If it’s the first hop, blame your own equipment.  If it’s anything in your ISP’s space, blame them.  If it’s a backbone, you can’t do anything about it.  And if it’s one of the final two hops, report it to the server owner–in this case, us.

Tags :

About

Jake is a web developer who loves data and automation. He went to the University of Alaska Fairbanks and studied Computer Science and Information Technology. Most of his work is done in PHP, MySQL, and Javascript.

Leave a Reply