A Possible Cause of AT&T's Wireless Clog — Configuration Errors 217
AT&T customers (iPhone users notably among them) have seen some wireless congestion in recent months; Brough Turner thinks the trouble might be self-inflicted. According to Turner, the poor throughput and connection errors can be chalked up to "configuration errors specifically, congestion collapse induced by misconfigured buffers in their mobile core network." His explanation makes an interesting read.
Software Robustness (Score:3, Interesting)
I find it just as problematic that applications software on Windows Mobile and other similar mobile OSes do not handle large network delays gracefully.
There is often very little feedback to the user of the software that actual progress is being made in attempt to communicate over the network. Sure, we can use the fuzzy "bars" indicator on the device to help diagnose what may be the cause of our trouble, but that doesn't indicate actual network conditions due to capacity. We also have animated indicators that web browsers and other applications use, but these still don't indicate any kind of actual success to communicate. In web browsers we get text alluding the DNS lookup, and connection attempt, but when you combine 'Connecting to...' with a simple spinning indicator or progress bar, that often doesn't convey that the message reached any destination or how long until you can expect any response from your local network based on its operating conditions.
The writers of the software may not fully understand the implications of being on a network with high packet loss or long round trip times. So they timeout or have errors that could be resolved by more delay or retry. In a mobile OS we should probably take this into account at the OS level, and opt out of this behavior only when the programmer or user specifies (if that's exposed).
safari sux (Score:3, Interesting)
I know this first hand (Score:5, Interesting)
I worked for AT&T in several parts of the country on their core networks, and in the early 2000's they had misconfigured all of their Solaris boxes and I worked with the infrastructure group to implement a startup script on Solaris to tune all the ndd settings for performance. The problem with Solaris is that by default all the TCP, UDP, Ethernet, etc settings are set for a Desktop workstation, not a server. Most system admins know to tune these settings, otherwise in a lot of cases a multi-CPU box will perform as slow as a 1 CPU box. Anyway, at specific companies I worked with (AT&T Broadband / Worldnet in St. Charles, MO was one big one), all the servers were configured without the proper settings for a server, so we had all kinds of issues as a result, a big one is that the tcp accept queue is not set high enough and so connections to daemons will drop after a low number of connections, making it appear that the box can't handle the connections...., As a result, they had spent millions on numerous servers (in one situation they had over twenty 12-cpu servers just for smtp...
These changes seem small, however, changing "ndd" kernel parameters on a Solaris box is not a single task, it is an infrastructure-wide task, and therefore requires the coordination of dozens of different groups, it really took a long long time to get this script implemented. It was called "S99nddfix" and it had all the ndd tunable parameters in it. Later when I worked at a different AT&T group in a different state, I noticed my script had been implemented on all the Solaris servers in the 200+ server environment.
Re:Why? Because they care... (Score:5, Interesting)
This is the problem. Thanks to the competitive barriers (such as the inability to move phones between all but two of the top four networks, and none of the top 3) moving can take a long time (2 year contract must expire) before someone can move networks unless they want to pay a large fee.
And then, you probably lose your phone. So even if you like it, you have to buyer either a different phone from the new provider, or the same one in their version. Both will cost you even more money, unless you're willing to be stuck on another 2 year contract.
The US system is very well setup, as far as carrier lock in goes.
It's rather amazing how many people go to AT&T for the iPhone. I think they said about 1/3 of their iPhone customers are coming from other networks. I wonder how many more people would get iPhones if it wasn't for their current contract? That's a big reason for many people I've talked to. The rest who want an iPhone are in the "I'd love it but I'm not touching AT&T again" camp.
Re:Software Robustness (Score:5, Interesting)
Zero Packet Loss (Score:5, Interesting)
Zero packet loss may sound impressive to a telephone guy, but it causes TCP congestion collapse and thus doesn't work for the mobile Internet!
I was in the standardisation group that specified the RLC/MAC layer (ETSI SMG2, later called 3GPP TSG GERAN) and our priorities were not the behaviour of TCP. We were designing the radio layer to provide a bearer service for the higher layer protocols, at that time they were X25, IP (UDP and TCP). The "problem" we were trying to solve was the tendancy of the radio layer to fade, have multipath and generally lose packets. The RLC layer was designed to deliver error-free packets, in sequence over the radio layer. Generally that is exactly what it does, and does well. If it didn't then tehre would be no mobile internet.
What we did find to be a significant performance problem was the asymetric channel. The uplink is usually the root of the TCP performance issues, UDP works much better. When the discrepancy is higher than 10, the downlink is ten times faster than the uplink, then the TCP Acks don't arrive in time and it stalls. Sadly a faster uplink is difficult and expensive to provide.
Re:I know this first hand (Score:4, Interesting)
I haven't worked for AT&T but at one point I tried to see what traceroute from San Diego to Finland would show me because ping was really slow. The traceroute I run jumped from west coast to east coast twice before going over the atlantic. I suspect that routing rules might need some fine tuning as well. It doesn't really matter if you have very fast network if the data keeps on jumping between servers creating extra traffic, I can imagine in my case the packet could have reached the destination with much fewer jumps.
Of course that visual traceroute I used might not really give accurate locations of the servers.
Re:First Time (Score:5, Interesting)
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
Re:safari sux (Score:1, Interesting)
Warning: This will go slightly on a tangent away from AT&T since the unnecessary reload topic was brought up:
I used Opera from about 2001 to 2006. My version installs were 5 and 6, about 2 numbers behind the standard back then. I'm not sure if this has changed because I've moved away to FF, Chrome and Safari on windows, but page reloading is a problem in 2 ways, and Opera beautifully gave you cached pages until you tried to reload. You could force more frequent loads if the default behavior screwed with your news sites though.
2 big things:
1) If you have 30 tabs open from a whole week's browsing session, and the browser reloads ALL of them at startup, you'll choke your DSL connection for a couple minutes, and also peg your processor with these "modern browsers," except maybe for Chrome's multithreaded load.
Some graphic boards have high-transit forum sections that auto-disappear in 20 minutes. If you were reading a long discussion and couldn't finish, closed your browser and reopened it past those 20 minutes, the page would just be gone when you returned. I prefer the cache behavior that lets me see the page there for weeks until I try to hit "Refresh" to see if new data has been posted.
Plus, browsers like FF tend to load everything into RAM, forget about your hard drive cache (I know there are settings to move things around, but they don't seem to do anything even with just one or two tabs open.) If you are looking at 20 1MB jpegs pictures that should easily be stored in your HD, and try to restart the browser, the session will start downloading all the images again. Us low-bandwidth DSL users can't afford to waste time like this.
I think FF 3 is a bit better at caching and reloading less by default. Remember, you and I can figure out how to go in and try to use a very, very large HD cache. If the browser logic does some weird stuff ignoring you anyway, then imagine what this does to average Joes who don't know why the pages are taking so long to display?
Re:AT&T Trouble Self Inflicted? (Score:3, Interesting)
The iPhone has managed to put itself in the hands of many people who've never had a very nice phone, so I think the iPhone is far better quality than a large portion of its user base is used to and comparable to other phones in its class.
For what it is worth, I believe Apple's selling points are in this order: Features, quality, price (the last two are very close, for better or worse). On the flip side, I feel many computer manufacturers are price first, features second, quality third. But of course, most companies have 'cheap' and 'expensive' lines of computers, so that varies. One thing I can say though, is Apple support is far superior to any support you'll get from another computer manufacturer these days.
Re:Software Robustness (Score:5, Interesting)
Re:AT&T Trouble Self Inflicted? (Score:2, Interesting)
Re:Software Robustness (Score:4, Interesting)
TCP/IP completely shields the application from the underlying transport.
A call is made to resolve a name (dns), a connection is opened, and... data flows. Or not.
So, your speculated "connectivity feedback" information has to come from a lower level than the application. It has to be in the stack.
Some platforms do incorporate this feedback from the stack. It just isn't an application responsibility. Even on platforms with good feedback (Blackberry), the applications are not aware.
And the application layer programmer should DEFINITELY not be making these decisions. If the application wants something other than TCP, the developer does have the option of using UDP.
Re:AT&T Trouble Self Inflicted? (Score:3, Interesting)
Europe has many more customers in a much smaller geographic area. I wonder if it isn't a lot cheaper to service them.
Re:First Time (Score:4, Interesting)
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
Yeah, I love the lack of forward planning by Telcos in Australia.
Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries.
Re:Zero Packet Loss (Score:3, Interesting)
Disclaimer: I've only designed one wireless packet data link system in my life, and it was symmetrical.
Re:AT&T Trouble Self Inflicted? (Score:3, Interesting)
Furthermore US consumers are locked into a contract which ensures a steady income for the service providers.
There's a challenge for you. Go and buy an unlocked cell phone. Then go to any of the major carriers and try to signup for service without a contract.
I tried this a few years ago and not a single carrier would sign me up without a two year contract. (What's the point of buying an unlocked phone if you can't take it from network to network without locking in to a contract. I might as well get the damn subsidized phone.)
Re:Hm (Score:2, Interesting)
I think the key word you assumed was being followed is "monitored". They setup the network and walk away and never monitor. I just upgraded my ATT DSL service but tested the speed before I submitted the change request. After the request was fulfilled, the speed remained the same. I called and 20 minutes later they had "corrected" the problem. Set it and forget it is the problem.
Re:Zero Packet Loss (Score:2, Interesting)
Re:Hm (Score:4, Interesting)
I Can Vouch For This - (Score:1, Interesting)
When I still used SBC (which as far as I know is owned by AT&T) it wasn't uncommon for our service to drop. The yearly total for the downtime could easily be measured in days, and was part of the reason they got ditched. I found something interesting, though.
Each time I had to contact a service representative from the company, they'd give me the same old song and dance. Not only was their network fine, I was fine. We seriously had no clue why I couldn't access anything, so finally one day I did some snooping of my own. (This was back when I was really new to this stuff. Go ahead and laugh.) I learned to appreciate the tracert command afterward.
I was indeed online, and could, with the right information at my disposal, access some servers. What's more, my DNS wasn't down at all. What was happening was my requests for web pages were getting stuck in the net. I'd watch as my message in a bottle floated away, from Indiana to Illinois, to Minnesota, and finally through several provinces before getting stuck in a loop between two routers in western Canada and then expiring. This would go on for hours, with my Internet access crippled. Apparently somewhere along the line my normal route (which took me through Atlanta instead) was going down and the new one was taking me up north instead, to a place from which no traffic could escape. They apparently never rectified this.
Benoit Mandelbrot had a similar problem (Score:5, Interesting)
Mandelbrot examined the data and found that there were periods of clear signal interrupted by noise. He examined the noise and found that within it were periods of clear signal, interrupted by noise and so on. Hmmm... He astutely determined that "shit happens" and what was needed was a redundant protocol, not better shielding. The noise you see, was inherent in a damped and driven system.
It was from this that he began his explorations of fractals and chaos theory, and we got robust network protocols.
Re:Non-obvious cause (Score:2, Interesting)
I'd much rather have ECN than packet loss. If an application is told straight up that it's suffocating the network, then it can back off immediately and smoothly instead of getting stung by packet drops.
Re:Fashion is transient (Score:3, Interesting)
Declared (as opposed to observed) customer satisfaction is unreliable. There's a good chance someone who paid twice the price for something will say they're very happy with it, if only because not being happy would make them a sucker. Works for handbags, too ^^ On the other hand, some people are less narcissistic and will require more from a more expensive product. Again, objective data is needed, such as % failing within 1 year.
Also, newer products / recent purchases tend to bring in better reviews. People are more careful with their new toy, and the device hasn't had time to break.
I understand your point about Applecare. I have no anecdotal evidence for or against it, I'm a DYI kinda guy. On the purely hardware side of things, I haven't noticed Apple being any better than other 1st or 2nd-tier vendors, nor good DYI.
Re:Fashion is transient (Score:3, Interesting)
Universities are a special case. Apple gives their biggest discounts to students - I got lower priced Macs as a student than some of their corporate customers, while most other manufacturers largely ignore this market. Students, by and large, don't have to interoperate with other software. If there's something that they need to run then there will be lab machines provided for them to use, they won't be expected to run it on their own machine. It's much easier to run a minority OS as a student than almost anywhere else.
Of course, if a lot of graduates are entering the work force with more Mac experience than Windows, then this may have a knock-on effect in industry over the next decade or so...
Re:First Time (Score:1, Interesting)
Posting anonymously to protect the innocent.
I'm a techie who is often chided by coworkers for my willingness to compromise on solutions in order to meet dates. I understand the business case. Shipping is a feature. However, what I cannot understand is being told to implement a partial solution and then being called on the carpet for it some time in the future as support volume goes through the roof. The business makes a case for a cheap phase I product, but never seems to plan for phase II and ongoing support, and acts surprised and mislead when it those become an issue. Everyone wants it done fast, cheap, *and* perfect the first time (techies included) but just as there are limitations to the budget, there are also limitations to just how much planning, deliberation, implementation and testing can be done, under pressure, with few requirements, little time, and an undersized workforce. If a techie could see phase II and III on the project plan, and see the budget and time allocated for support, and knew there was a plan in place to handle the fallout of the phase I compromises, I'm sure there would be no issue. But that's not often the case; phase I rolls out and the next project is started before the deluge of support calls start to roll in. Then the techie has to juggle the same high-pressure workload of a new project in addition to the support load of the old project, with no end in sight.
That being the case, is it any wonder we fight for robust solutions?
Re:First Time (Score:3, Interesting)
Re:First Time (Score:2, Interesting)
Management here actually has a clue, or rather, they know that technically they don't have a clue and will defer to the people who *do* know.