Forgot your password?
typodupeerror
Wireless Networking Cellphones Communications Networking

A Possible Cause of AT&T's Wireless Clog — Configuration Errors 217

Posted by timothy
from the three-card-monty-design dept.
AT&T customers (iPhone users notably among them) have seen some wireless congestion in recent months; Brough Turner thinks the trouble might be self-inflicted. According to Turner, the poor throughput and connection errors can be chalked up to "configuration errors specifically, congestion collapse induced by misconfigured buffers in their mobile core network." His explanation makes an interesting read.
This discussion has been archived. No new comments can be posted.

A Possible Cause of AT&T's Wireless Clog — Configuration Errors

Comments Filter:
  • Software Robustness (Score:3, Interesting)

    by dziman (415307) on Sunday October 25, 2009 @10:13PM (#29868513)

    I find it just as problematic that applications software on Windows Mobile and other similar mobile OSes do not handle large network delays gracefully.

    There is often very little feedback to the user of the software that actual progress is being made in attempt to communicate over the network. Sure, we can use the fuzzy "bars" indicator on the device to help diagnose what may be the cause of our trouble, but that doesn't indicate actual network conditions due to capacity. We also have animated indicators that web browsers and other applications use, but these still don't indicate any kind of actual success to communicate. In web browsers we get text alluding the DNS lookup, and connection attempt, but when you combine 'Connecting to...' with a simple spinning indicator or progress bar, that often doesn't convey that the message reached any destination or how long until you can expect any response from your local network based on its operating conditions.

    The writers of the software may not fully understand the implications of being on a network with high packet loss or long round trip times. So they timeout or have errors that could be resolved by more delay or retry. In a mobile OS we should probably take this into account at the OS level, and opt out of this behavior only when the programmer or user specifies (if that's exposed).

  • safari sux (Score:3, Interesting)

    by peterflat (1326469) on Sunday October 25, 2009 @10:17PM (#29868547)
    it doesn't help that the safari client that the iphone uses will double load a page. Even if the user closes safari for a couple minutes, when reopening the browser the current page will reload. lose lose for everyone.
  • by NynexNinja (379583) on Sunday October 25, 2009 @10:24PM (#29868585)

    I worked for AT&T in several parts of the country on their core networks, and in the early 2000's they had misconfigured all of their Solaris boxes and I worked with the infrastructure group to implement a startup script on Solaris to tune all the ndd settings for performance. The problem with Solaris is that by default all the TCP, UDP, Ethernet, etc settings are set for a Desktop workstation, not a server. Most system admins know to tune these settings, otherwise in a lot of cases a multi-CPU box will perform as slow as a 1 CPU box. Anyway, at specific companies I worked with (AT&T Broadband / Worldnet in St. Charles, MO was one big one), all the servers were configured without the proper settings for a server, so we had all kinds of issues as a result, a big one is that the tcp accept queue is not set high enough and so connections to daemons will drop after a low number of connections, making it appear that the box can't handle the connections...., As a result, they had spent millions on numerous servers (in one situation they had over twenty 12-cpu servers just for smtp...

    These changes seem small, however, changing "ndd" kernel parameters on a Solaris box is not a single task, it is an infrastructure-wide task, and therefore requires the coordination of dozens of different groups, it really took a long long time to get this script implemented. It was called "S99nddfix" and it had all the ndd tunable parameters in it. Later when I worked at a different AT&T group in a different state, I noticed my script had been implemented on all the Solaris servers in the 200+ server environment.

  • by MBCook (132727) <foobarsoft@foobarsoft.com> on Sunday October 25, 2009 @10:29PM (#29868617) Homepage

    Because those people if they dislike the network enough, will leave eventually.

    This is the problem. Thanks to the competitive barriers (such as the inability to move phones between all but two of the top four networks, and none of the top 3) moving can take a long time (2 year contract must expire) before someone can move networks unless they want to pay a large fee.

    And then, you probably lose your phone. So even if you like it, you have to buyer either a different phone from the new provider, or the same one in their version. Both will cost you even more money, unless you're willing to be stuck on another 2 year contract.

    The US system is very well setup, as far as carrier lock in goes.

    It's rather amazing how many people go to AT&T for the iPhone. I think they said about 1/3 of their iPhone customers are coming from other networks. I wonder how many more people would get iPhones if it wasn't for their current contract? That's a big reason for many people I've talked to. The rest who want an iPhone are in the "I'd love it but I'm not touching AT&T again" camp.

  • by kaiser423 (828989) on Sunday October 25, 2009 @10:49PM (#29868699)
    Blackberries are awesome about this with the bi-directional communication arrows. When I'm with friends in an area of low reception, they're all walking around randomly trying to call every two yards, and waiting 15 seconds before determining that its not going to work. I walk around until I see an incoming arrow. I freeze and then make a call. Works wonderously.
  • Zero Packet Loss (Score:5, Interesting)

    by Anonymous Coward on Sunday October 25, 2009 @11:03PM (#29868761)

    Zero packet loss may sound impressive to a telephone guy, but it causes TCP congestion collapse and thus doesn't work for the mobile Internet!

    I was in the standardisation group that specified the RLC/MAC layer (ETSI SMG2, later called 3GPP TSG GERAN) and our priorities were not the behaviour of TCP. We were designing the radio layer to provide a bearer service for the higher layer protocols, at that time they were X25, IP (UDP and TCP). The "problem" we were trying to solve was the tendancy of the radio layer to fade, have multipath and generally lose packets. The RLC layer was designed to deliver error-free packets, in sequence over the radio layer. Generally that is exactly what it does, and does well. If it didn't then tehre would be no mobile internet.

    What we did find to be a significant performance problem was the asymetric channel. The uplink is usually the root of the TCP performance issues, UDP works much better. When the discrepancy is higher than 10, the downlink is ten times faster than the uplink, then the TCP Acks don't arrive in time and it stalls. Sadly a faster uplink is difficult and expensive to provide.

  • by Kumiorava (95318) on Sunday October 25, 2009 @11:12PM (#29868789)

    I haven't worked for AT&T but at one point I tried to see what traceroute from San Diego to Finland would show me because ping was really slow. The traceroute I run jumped from west coast to east coast twice before going over the atlantic. I suspect that routing rules might need some fine tuning as well. It doesn't really matter if you have very fast network if the data keeps on jumping between servers creating extra traffic, I can imagine in my case the packet could have reached the destination with much fewer jumps.

    Of course that visual traceroute I used might not really give accurate locations of the servers.

  • Re:First Time (Score:5, Interesting)

    by MichaelSmith (789609) on Sunday October 25, 2009 @11:25PM (#29868847) Homepage Journal

    I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.

    I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.

  • Re:safari sux (Score:1, Interesting)

    by Anonymous Coward on Sunday October 25, 2009 @11:57PM (#29868955)

    Warning: This will go slightly on a tangent away from AT&T since the unnecessary reload topic was brought up:

    I used Opera from about 2001 to 2006. My version installs were 5 and 6, about 2 numbers behind the standard back then. I'm not sure if this has changed because I've moved away to FF, Chrome and Safari on windows, but page reloading is a problem in 2 ways, and Opera beautifully gave you cached pages until you tried to reload. You could force more frequent loads if the default behavior screwed with your news sites though.

    2 big things:

    1) If you have 30 tabs open from a whole week's browsing session, and the browser reloads ALL of them at startup, you'll choke your DSL connection for a couple minutes, and also peg your processor with these "modern browsers," except maybe for Chrome's multithreaded load.

    Some graphic boards have high-transit forum sections that auto-disappear in 20 minutes. If you were reading a long discussion and couldn't finish, closed your browser and reopened it past those 20 minutes, the page would just be gone when you returned. I prefer the cache behavior that lets me see the page there for weeks until I try to hit "Refresh" to see if new data has been posted.

    Plus, browsers like FF tend to load everything into RAM, forget about your hard drive cache (I know there are settings to move things around, but they don't seem to do anything even with just one or two tabs open.) If you are looking at 20 1MB jpegs pictures that should easily be stored in your HD, and try to restart the browser, the session will start downloading all the images again. Us low-bandwidth DSL users can't afford to waste time like this.

    I think FF 3 is a bit better at caching and reloading less by default. Remember, you and I can figure out how to go in and try to use a very, very large HD cache. If the browser logic does some weird stuff ignoring you anyway, then imagine what this does to average Joes who don't know why the pages are taking so long to display?

  • by RudeIota (1131331) on Monday October 26, 2009 @12:18AM (#29869015) Homepage
    To be fair, MojoRilla's argument was it's one of the "best smartphones out there", not the highest quality and certainly not the most reliable.

    The iPhone has managed to put itself in the hands of many people who've never had a very nice phone, so I think the iPhone is far better quality than a large portion of its user base is used to and comparable to other phones in its class.

    For what it is worth, I believe Apple's selling points are in this order: Features, quality, price (the last two are very close, for better or worse). On the flip side, I feel many computer manufacturers are price first, features second, quality third. But of course, most companies have 'cheap' and 'expensive' lines of computers, so that varies. One thing I can say though, is Apple support is far superior to any support you'll get from another computer manufacturer these days.
  • by nick0909 (721613) on Monday October 26, 2009 @12:21AM (#29869027)
    The arrows show data traffic as well as voice traffic. It is very nice to see a whole lot of up, down, or both arrows flashing when an app is sitting "unresponsive." You know data is flying so nothing is wrong, just wait and the app will respond when it has the data it needs. The arrows (at least on my 8330) are large for the faster network, and thin for the slow network so I even know when it will take longer because of poor network coverage. I used a Windows Mobile phone for a week and it drove me mad not knowing what was going on with the network data.
  • by dUN82 (1657647) on Monday October 26, 2009 @12:34AM (#29869075)
    Totally agree ! Have being lived in 3 countries for some considerable time and as a mobile phone user in China (9 yrs), UK(5yrs), and the US(3yrs), I have to say: US has the worst cellphones selection, worst cellphone call quality + coverage, highest tariffs, lengthiest contract, and most unfair contract. I have to settle with AT&T when I arrived in the US since I have my own GSM phone which keeps out of Sprint/Verizon, and honestly speaking, I am glad I brought my own phone, because there really isn't anything that you can choose from until the iphone came out, I do not want to start on comparing different phone brands, but I do want to have a phone that run on its original OS from the phone company, and is unlocked or at least unlock it after the contract, which apparently is too much to ask for in the US. I will also not blame the poor network coverage since low population density etc, but I pay around twice as much for the same mins I pay in the UK and do not let me start comparing the cost per min in China, further, how on earth can you still charge people for receiving text msgs and answering phone calls? Even state owned duopoly or future tri-poly Chinese mobile provider dumped that policy, and it would be a total joke to UK and EU customers. 2 year contract is one of the worst, and I think they started on 18 month contract in the UK about 2 years ago, but they still give you the option to do a 12 month contract if you choose to pay a bit more for the handset, just imagine a situation that you might not even be in US long enough to finish the contract, what should you do then? The worst part is nothing seems to change or getting improved. Finally, the issues or rotted with US has the worst telecommunication business model, capitalism really suck shit and just does not work for the people!
  • by ratboy666 (104074) <[fred_weigel] [at] [hotmail.com]> on Monday October 26, 2009 @12:37AM (#29869079) Homepage Journal

    TCP/IP completely shields the application from the underlying transport.

    A call is made to resolve a name (dns), a connection is opened, and... data flows. Or not.

    So, your speculated "connectivity feedback" information has to come from a lower level than the application. It has to be in the stack.

    Some platforms do incorporate this feedback from the stack. It just isn't an application responsibility. Even on platforms with good feedback (Blackberry), the applications are not aware.

    And the application layer programmer should DEFINITELY not be making these decisions. If the application wants something other than TCP, the developer does have the option of using UDP.

  • by Alpha830RulZ (939527) on Monday October 26, 2009 @12:39AM (#29869085)

    Europe has many more customers in a much smaller geographic area. I wonder if it isn't a lot cheaper to service them.

  • Re:First Time (Score:4, Interesting)

    by bertok (226922) on Monday October 26, 2009 @12:40AM (#29869091)

    I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.

    I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.

    Yeah, I love the lack of forward planning by Telcos in Australia.

    Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries.

  • Re:Zero Packet Loss (Score:3, Interesting)

    by NixieBunny (859050) on Monday October 26, 2009 @12:44AM (#29869103) Homepage
    It sounds like the solution might be to implement a custom version of TCP that takes the asymmetry of the physical radio channels into account. Since most mobile platforms have a much higher downlink packet count, a group ack method could provide relief to the unreliable uplink channel.

    Disclaimer: I've only designed one wireless packet data link system in my life, and it was symmetrical.

  • by darkpixel2k (623900) <aaron@heyaaron.com> on Monday October 26, 2009 @12:46AM (#29869111) Homepage

    Furthermore US consumers are locked into a contract which ensures a steady income for the service providers.

    There's a challenge for you. Go and buy an unlocked cell phone. Then go to any of the major carriers and try to signup for service without a contract.

    I tried this a few years ago and not a single carrier would sign me up without a two year contract. (What's the point of buying an unlocked phone if you can't take it from network to network without locking in to a contract. I might as well get the damn subsidized phone.)

  • Re:Hm (Score:2, Interesting)

    by Anonymous Coward on Monday October 26, 2009 @12:48AM (#29869125)

    I think the key word you assumed was being followed is "monitored". They setup the network and walk away and never monitor. I just upgraded my ATT DSL service but tested the speed before I submitted the change request. After the request was fulfilled, the speed remained the same. I called and 20 minutes later they had "corrected" the problem. Set it and forget it is the problem.

  • Re:Zero Packet Loss (Score:2, Interesting)

    by BodeNGE (1664379) on Monday October 26, 2009 @01:37AM (#29869293)
    OP here. It could possibly be done between the IP stack of the device and the GGSN. The six layer stack diagram in the article does show the IP layer going between the two, but in reality it doesn't. On the mobile side there are usually two IP stacks. The GSM one to the GGSN access point (APN), and the stack presented to the client AP, or the the PC connected to it (calleed Terminal Equipment in GSM speak). If you tracert from the device you will usually see the device IP as a loopback address. The GPRS/3G part then takes over and spits our the rest of the IP connection on the GGSN side. The GGSN side may give you a public IP if you pay for it, but usually it is a pooled address, and not externally addressable. You could change the "inner" IP link between the GGSN and MS if it was implemented in the MS IP stack. You wouldn't need to implement it in the outer bearer servce between the PC and the destination IP. Other than that there really isn't much you can change/tweak in the network. Smaller buffers will cause packets to be discarded, then TCP takes over as normal. Setting the MTU size on the GGSN also helps. Optimum from the MS perspective is around 1400.
  • Re:Hm (Score:4, Interesting)

    by phantomfive (622387) on Monday October 26, 2009 @02:14AM (#29869497) Journal
    I don't have an iPhone and don't have experience with this particular problem, but in general there aren't automatic monitoring devices for mobile networks out in the field, so if AT&T wants to know what is happening on the devices, they have to send a team out with tools and monitoring devices to check. If this is a problem that only happens when several iphone users get together in an area at the same time, then the problem may have gone away by the time a team comes out to check (if they come at all).
  • by Anonymous Coward on Monday October 26, 2009 @02:25AM (#29869557)

    When I still used SBC (which as far as I know is owned by AT&T) it wasn't uncommon for our service to drop. The yearly total for the downtime could easily be measured in days, and was part of the reason they got ditched. I found something interesting, though.

    Each time I had to contact a service representative from the company, they'd give me the same old song and dance. Not only was their network fine, I was fine. We seriously had no clue why I couldn't access anything, so finally one day I did some snooping of my own. (This was back when I was really new to this stuff. Go ahead and laugh.) I learned to appreciate the tracert command afterward.

    I was indeed online, and could, with the right information at my disposal, access some servers. What's more, my DNS wasn't down at all. What was happening was my requests for web pages were getting stuck in the net. I'd watch as my message in a bottle floated away, from Indiana to Illinois, to Minnesota, and finally through several provinces before getting stuck in a loop between two routers in western Canada and then expiring. This would go on for hours, with my Internet access crippled. Apparently somewhere along the line my normal route (which took me through Atlanta instead) was going down and the new one was taking me up north instead, to a place from which no traffic could escape. They apparently never rectified this.

  • by j-stroy (640921) on Monday October 26, 2009 @03:02AM (#29869715)
    As I recall, the story went: Mandelbrot was a mathematician at IBM lab. The engineers were attempting high speed data networking, but were encountering data/signal loss due to some noise. So like good engineers, they made things more robust, better isolation, grounds, shielding, etc. but the darn noise was still there.. They could not get rid of it. Determined to find the cause, they went to Mandelbrot with the request to analyze the noise, to determine its cause, in order to eliminate it.

    Mandelbrot examined the data and found that there were periods of clear signal interrupted by noise. He examined the noise and found that within it were periods of clear signal, interrupted by noise and so on. Hmmm... He astutely determined that "shit happens" and what was needed was a redundant protocol, not better shielding. The noise you see, was inherent in a damped and driven system.

    It was from this that he began his explorations of fractals and chaos theory, and we got robust network protocols.
  • Re:Non-obvious cause (Score:2, Interesting)

    by shentino (1139071) on Monday October 26, 2009 @04:47AM (#29870153)

    I'd much rather have ECN than packet loss. If an application is told straight up that it's suffocating the network, then it can back off immediately and smoothly instead of getting stung by packet drops.

  • by obarthelemy (160321) on Monday October 26, 2009 @05:45AM (#29870417)

    Declared (as opposed to observed) customer satisfaction is unreliable. There's a good chance someone who paid twice the price for something will say they're very happy with it, if only because not being happy would make them a sucker. Works for handbags, too ^^ On the other hand, some people are less narcissistic and will require more from a more expensive product. Again, objective data is needed, such as % failing within 1 year.

    Also, newer products / recent purchases tend to bring in better reviews. People are more careful with their new toy, and the device hasn't had time to break.

    I understand your point about Applecare. I have no anecdotal evidence for or against it, I'm a DYI kinda guy. On the purely hardware side of things, I haven't noticed Apple being any better than other 1st or 2nd-tier vendors, nor good DYI.

  • by TheRaven64 (641858) on Monday October 26, 2009 @07:32AM (#29870801) Journal

    Universities are a special case. Apple gives their biggest discounts to students - I got lower priced Macs as a student than some of their corporate customers, while most other manufacturers largely ignore this market. Students, by and large, don't have to interoperate with other software. If there's something that they need to run then there will be lab machines provided for them to use, they won't be expected to run it on their own machine. It's much easier to run a minority OS as a student than almost anywhere else.

    Of course, if a lot of graduates are entering the work force with more Mac experience than Windows, then this may have a knock-on effect in industry over the next decade or so...

  • Re:First Time (Score:1, Interesting)

    by Anonymous Coward on Monday October 26, 2009 @11:56AM (#29873317)

    Posting anonymously to protect the innocent.

    I'm a techie who is often chided by coworkers for my willingness to compromise on solutions in order to meet dates. I understand the business case. Shipping is a feature. However, what I cannot understand is being told to implement a partial solution and then being called on the carpet for it some time in the future as support volume goes through the roof. The business makes a case for a cheap phase I product, but never seems to plan for phase II and ongoing support, and acts surprised and mislead when it those become an issue. Everyone wants it done fast, cheap, *and* perfect the first time (techies included) but just as there are limitations to the budget, there are also limitations to just how much planning, deliberation, implementation and testing can be done, under pressure, with few requirements, little time, and an undersized workforce. If a techie could see phase II and III on the project plan, and see the budget and time allocated for support, and knew there was a plan in place to handle the fallout of the phase I compromises, I'm sure there would be no issue. But that's not often the case; phase I rolls out and the next project is started before the deluge of support calls start to roll in. Then the techie has to juggle the same high-pressure workload of a new project in addition to the support load of the old project, with no end in sight.

    That being the case, is it any wonder we fight for robust solutions?

  • Re:First Time (Score:3, Interesting)

    by umghhh (965931) on Monday October 26, 2009 @12:52PM (#29873955)
    you of course have a point but I have seen enough of managers whose eyes went glassy before the first sentence of the explanation has even been completed. It is difficult to explain things to people who are managers either because they were technically incompetent, too agressive to work in a team or jump to frequently between companies to have any idea what the one in which they work currently actually does. Obviously the MBA courses consider the actual work beyond financial success as an uniform mass that can be sliced, transported etc without any difference on the end product - something that is evidently not correct albeit this may be avoiding perception as it is difficult to see things clearly in ever changing world of saving cuts, bonuses that they induce and position hops (made to avoid consequences).
  • Re:First Time (Score:2, Interesting)

    by MadKeithV (102058) on Tuesday October 27, 2009 @04:00AM (#29881483)
    Maybe my vision is coloured because I'm a techie who's been a manager, and went back to an intra-company consulting role to act as the translator / mediator between the techies and management.

    Management here actually has a clue, or rather, they know that technically they don't have a clue and will defer to the people who *do* know.

An Ada exception is when a routine gets in trouble and says 'Beam me up, Scotty'.

Working...