Sunday, November 8, 1998
Topic/Presenter
Full Abstract

Speakers
Mukesh Agrawal, IPMA/U-M
Abha Ahuja, Merit Network
Jimmy Wang, IPMA/U-M

Full Abstract

This tutorial reviews some of the more subtle points of CIDR, aggregation, and renumbering. Included are tricks and techniques that the newer ISP might need, including pure address administration and procedures for submitting address space justifications. Hank Nussbacher's CIDR FAQ is a prerequisite for the session.

Speakers
Howard Berkowitz

Full Abstract

Tips for ISPs on external route selection, including the BGP MED and LOCAL_PREF attributes, peering at multiple locations; backup transit; and how to mix transit, public, and private peering.

Speakers
Avi Freedman, Net Access

Full Abstract

Speakers
Bill St. Arnaud, CANARIE

Full Abstract

Includes:

Domestic Users of Caching

Enabling Technologies

Speakers
Moderator - Peter Danzig, Network Appliance
Panelist - James Aviani, Cisco Systems
Panelist - Bill Maggs, MCI.
Panelist - Shirish Sathaye, Alteon
Speaker - Ed Kern, DIGEX

Monday, November 9, 1998
Topic/Presenter
Full Abstract

Speakers
Moderator - Dave Meyer, Cisco Systems
Panelist - Danielle Deibler, DIGEX
Panelist - Mujahid Khan, Sprint
Panelist - Henry Kilmer, DIGEX
Panelist - Dorian Kim, Verio
Panelist - Jian Li, Qwest
Panelist - Doug Pasko, Cable & Wireless
Panelist - Steve Rubin, AboveNet
Panelist - Amir Tabdili, Sprint

Full Abstract

A Plea for Input from the IETF Multicast-Address Allocation (MALLOC) Working Group

Speakers
Dave Thaler, Microsoft

Full Abstract

Juniper Networks

Full Abstract

Scribe Notes

(The group (about 120 people) met from 7:30-9:00PM Monday evening to discuss the issues folks have run into constructing and operating Internet Data Centers. It seemed like about a dozen or so in addition to the panel had constructed or were installing Internet Data Centers and shared their experiences in the finest tradition of a BOF. Many others contributed observations and suggestions as well based on their experiences with Internet facilities. I've tried to capture the nature of the discussion from these notes. After we ran out of time many folks adjourned to a room upstairs and continued expanding the notes a bit in smaller focus groups. I wasn't involved in all of these discussions and folks added stuff to the lists. I've tried to capture these additions as well. )


I've included two appendices: Michael P. Lucking contributed a worksheet for calculating BTU cooling units required for data centers, and I added the panels conference call notes highlighting discussion points that we may not have covered during the BOF.


Format of this document

We came up with a list of Internet Data Center Issues and came up with recommendation based on experiences of the group. The intent was to share information about critical infrastructure support and not necessarily cover all aspects of data center construction. The hope is that this document will help folks know what to look for in stable infrastructure in which to put our Internet equipment.


Power

Big issue here is grounding for which there are military specs (source?)


Internet Data Centers are moving to requiring both AC and DC. There was some debate over the relative benefits of AC (that's all most ISPs know) and DC (cleaner more consistent).


Security of the grounding system itself. One can tap into the data system through grounding system.


Resources: IEEE Emerald Book which describes data and electrical grounding, and the Green Book which goes into commercial grounding, and NEBS Standards for office space delivery.


There is a multi-point vs. single point grounding tradeoff. The recommendation is to use multi-drop for simplicity of engineering reasons.


Recommendation for a telephone, electrical outlet and flashlight, telescoping dental mirror, and headtop mining lights in the power room. Make sure you have in stock extra brass (melting) fuses (100Amp, 300Amp, 600Amp bars).


Have EPO Panic button with cover


2n/n+1 redundancy required for all critical facilities


Consider stronger than expected power for facilities staff workstations.


Emergency power off is part of the code. There is a code difference between a facilitiy called a "Computer Room" and other names. The suggestion was that you may be better off using another name because the codes for a computer room may require non-optimal implementations for Internet Facilities.


Power conditioning required. The noise and frequency of serious spikes varies and requires you to condition the lines.


Checking on power shouldn't require power


Backups UPS/Generator Power for the facility dictates how long humans can inhabit the facility and therefore determines the life of the facility. Security systems need to be on the same protected power system as the rest of the facility.


Redundant power through multiple grids is non-trivial to do but highly desirable. Experience has shown that Internet facilities typically have A/B Buss for power and 30Amps delivered to each rack.


Some recommendation for data and power lines to be on different planes. Power and data overhead needs to be well supported (weight concern) and in seismic areas well braced.


A recommendation was made for fiber-impregnated batteries to avoid acid on the floot.


Most importantly, all Internet Facilities must have generators for surviving long-term power outages (especially as we go through competitive deregulation in the power industry…) Diesel fuel may raise EPA issues in some areas. Diesel fuel contamination is an issue and a solution that some have includes multiple fuel tanks. Contracts for trucking in additional fuel is suggested, however it is also pointed out that during this type of crisis, other (emergency support, police, hospital, etc.) facilities may get priority service, and indeed your supplier may not have the robustness required. Fire drills here may help. Dual contracts for fuel delivery was also suggested.


HVAC

The cooling needs are unique to Internet Data Centers. The amount of heat generated by some of this equipment is extreme and highly variable in our industry.


Chilled water cooling was recommended for equipment and heating/cooling accomodations for emergency staff


Rule of thumb is KW/20=t where t is the number of tons of cooling required.


Tradeoffs include raised floors versus relay racks


Important issues include redundancy for HVAC systems


Load shedding protocol - when to turn off monitors in data center?


Monitor HVAC system


Humidity - over cooling causes condensation on equipment, too dry leads to excessive static


Watch for hotspots and cold spots. One exchange is frigid cold under vent and still hot elsewhere.


Use clean/soft/distilled water


Failure mode for HVAC should be open/full blast AC


Separate venting for Internet Data Center than for broader building.


All AC should be on same power system as the rest of the data center


Water pipes/drain pipes above facilities raises concerns. Upper floor drains clogging as well. One solution was rubberized floors above data center.


One concern was that the facility is volatile organic compounds (Radon, etc.). An approach would be to specify X number of air changes per hour. 85% filtration of outside air was recommended.


Problem resolution of HVAC then needs to be pulled into the NOC in some way…


Standing water is bad - condensation trays and sewer backup system stories.


Timely A/C maintenance is required.

Fire Suppression

Pre action sprinklers - require air in pipe prior to water

Two stage systems are recommended that trigger only when two zone sensors go off.

Trigger in different zones as needed

Highly desirable to have a standby switch to abort firing of the system.


In some areas trigger needs to also cut power...

Fire Drills required prior to opening Internet Data Center. This may be expensive. In some area, 2 firings is required separated by 20 minute ventilation time.

FM200 system is common as a replacement for Halon, water and chemical systems.

Heat sensors and smoke sensors are require.

Disaster recovery issues

Drains

Drills and training of staff

Recommendation to talk with local fire company

Make sure fire alarm causes security system failure is full open for fire department

Need manual forced dump (this was added upstairs and I don't know what is meant here.)

Need many large extinguishers


Physical Security

Recommended that walls stop at concrete and their should be no raised ceilings


Cellular phone for alarm and monitoring system


Battery backups for entry access system - how long should the facility stay up after the loss of power? > 6hrs required.


Fire proof high stress shatterproof glass


Pizza PO to prevent engineer riots, Coffee sushi & espresso in emergency facilities


Fire alarm defeats security system for safety reasons


Keyed (personal) biometric access control & off-site loggin


Air lock - one at a time - access. No piggy back entry.


Motion detectors


  Cable Management

tie wrap - not too tight or velcro screwed down

Fiber - square open face conduit

Horizontal fiber management units

In-rack cable tie-down panels

Bundles and patch panels (e.g. 50 pair or 100 pair copper, 24 or 48 strand fiber - single and multimode.)


Data Plant

T1 cabling

Cat 5 cabling

T3 distribution - no crimp connectors!

Hierarchical - T1/T3/ther/Finber

Patch Panels

Star or modified star topologies

DACS - CNR

T1 in-line testing

DSX Panels

Optical splitter/monitor patch fees.


  Fire Drills Importance - some test facilities weekly. Alarms going off, etc.

General Issue

Across all topics there is a Y2K issue.

Layer 1 Recommendations

Use of 506 category

We discussed the importance of fiber entrance diversity, that is, the ability for multiple carriers to enter the facilities through different paths. The use of existing vaults, tunnels, shopping malls, and wireless were mechanisms to accomplish this.


We also pointed to the importance of locating data centers along telco fiber meet paths. This would make multiple carrier ingress into the facilities easier. The issue then becomes "How do we find out where the carriers lay their fiber so we can pick a good location for the Internet Data Center? Suggestions included by looking at building permits which are public records, asking install crews on the street doing an install, and simply asking the telco (maybe requiring NDA, previous relationships, and perhaps $$ volume)


Voice Communication

We didn't get to discuss this.


Flooring & Ceiling

We didn't get to discuss this.

Where to find Information on Data Centers?

Discussions with CLECs and Vendors

Requirements lists for Heating, Venting, Air Conditioning and responses from construction responses

Banks and Military and Financial instituions already have fairly robust generic data center specs (where?)

Telcos use Bellcore docs - describing wiring standards (how to get these?)


Data Center Mailing List:

send a message to [email protected] with "subscribe datacenter" in the body.


Quote of the Day

"Build your data center next to someone who needs it more than you do."


 


 

 


Appendix A - Calculating BTU Cooling Units


This spreadsheet contributed by "Michael P. Lucking" <[email protected]>


All temps are in degrees F.

1) Windows exposed to the sun:


Use only one exposrue: Select the one that gives the largest result

if no venetian or shading device is available mult X 1.4


1.1) South ____ sq. ft x (Max outside temp - 30) = _____ BTU/HR

1.2) E/W/SE ____ sq. ft x (Max outside temp - 3) = _____ BTU/HR

1.3) NW ____ sq. ft x (Max outside temp - 23) = _____ BTU/HR

1.4) NE ____ sq. ft x (Max outside temp - 25) = _____ BTU/HR

1.5) N ____ sq. ft x (Max outside temp - 85) = _____ BTU/HR


Answer for #1 MAX (1.1 - 1.5)


2) All Windows not included in Item 1 (interior windows etc)

___ sq. ft x (Max exposure temp - 69) = _____ BTU/HR


3) Walls exposed to sun

(Use only the wall with the eposure used in item 1)


3.1) Light Construction ____ Lin. ft x (Max outside temp - 25) = _____ BTU/HR

3.2) Heavy Construction ____ Lin. ft x (Max outside temp - 55) = _____ BTU/HR



Heavy defined at 12" masonry or insulation.


4) Shade Walls not included in Item 3

____ Lin. ft x (Max outside temp - 55) = _____ BTU/HR


5) Partitions

(Interior Wall adjacent to an unconditioned space)

____ Lin Ft x (Max temp - 50) = _____ BTU/HR


 

6) Ceiling or roof


6.1) Ceiling with Unconditioned occupied space above

___ Lin Ft x (Max temp - 90) = _____ BTU/HR



6.2) Ceiling with Attic Space above

6.2.1) No Insulation ___ Sq Ft x (max temp - 83) = _____ BTU/HR

6.2.2) 2" or more ___ Sq Ft x (max temp - 90) = _____ BTU/HR

6.3) Flat roof with no ceiling below

6.3.1) No Insulation ___ Sq Ft x (max temp - 85) = _____ BTU/HR

6.3.1) 2" or more ___ Sq Ft x (max temp - 90) = _____ BTU/HR


7) Floor

(over unconditioned space or venter crawl space. ignore any heat

gain from floor directly on ground or over unheated basement)


____ sq. ft x (Max temp - 90) = _____ BTU/HR


8) People (includes allowance for ventilation)


____ x 750 = _____ BTU/HR


9) Lights (if total wattage is know use 9.1 else 9.2)

9.1) ____ Watts x 4.25

9.2) ____ (sq ft floor space x 3 )Watts x 4.25


10) Computer load (some computers/routers acctually supply BTU/HR ratings

else you will have to calculate it)

10.1) Total BTU/HR for all machines

10.2) Total max wattage x 3.4 = _______ BTU/HR


______________________

Sum all the BTU/HR, this is your total load factor.


 


Appendix B - NANOG 14 Data Center BOF Pre-Meeting Conference Call for Discussion Points



Data Center Needs, Problems and Technologies

NANOG 14 BOF 7:30-9PM


On Conference Call:

Bill Norton <[email protected]>

Sean Donelan <[email protected]>

Jay Adelson <[email protected]>

Juston Newton <[email protected]>


Abstract:

Internet facilities need to grow more robust to meet the needs of today's networking environment. Outages due to air conditioning problems, accidental circuit pulls, and shortages of space and bandwidth at collocation and exchange facilities all lead to reliability issues that now affect millions of users worldwide. This BOF is intended to highlight concerns and technology problems that affect the robustness of the Internet Data Center facilities. We hope that specific recommendations on areas for improvement in infrastructure can be made that the community can adopt when constructing or improving infrastructure facilities.


We discussed a few approaches for this BOF and agreed that the BOF should be informal, with perhaps a welcome and brief introduction of the panel by Bill, and a discussion of Internet Facilities Issues.


Discussion Points

-----------------

Confidentiality Issue - so many are hiding mistakes, as an industry we are failing to effectively learn from others mistakes. This BOF is a sharing forum.


Hollywood vs. Reality - apprearance of reliable production data centers often are much duller than what folks may expect, focusing on function vs. form.


Biggest Problems Facing Internet Facilities - Managing Growth.

Cable Management - keeping track of # of cables, what goes where, the decision to leave a cable or pull it, how many cables to pre-install. Which exchange point do you think can best track down a wire failure? Sophisticated SW for doing this stuff exists.


No Internet Standards for facilities.

BellCore classes exist for TelCo

BICSI - Building Industry Cable Standard Institute defines standards for # of data ports for an office building for example (TIA568).


Data Industry - Not Invented Here Syndrome. All Internet Engineers have a religious view about the "right" way for a closet to look. We need something like a common practices resource for hub/cable layout.


Phone companies have folks dedicated to nothing but cable plant installs. Don't let folks do their own wiring.


Power in Data Centers - Planning is difficult. Planning for power maintenance. Example: Single panel for power feed and UPS feed; panel can not be serviced without losing both UPS and power feed.


Some Resources on Power may be available for our community:

NEBS stds on power

Military standards on power

Hospital standards on power

Service Requirements for "Critical Services" - recover in < 6 seconds

Service Requirements for "Essential Services" - recover in < 10 minutes

Service Requirements for "Routine Services" - recover in < 2 hours


Design of Data Centers - who is qualified? Any Internet Engineer? What expertise is required to build a commercial grade internet facility? There is no "Certified Data Center Consultant", no good stds or common practices docs.

Jay brought up his current experience of hiring a construction company that has on staff folks from large data center (IBM, etc) crews. Expertise includes experience on 100% availability services provisioning: hospital, police stations, financial markets, military.


Testing Internet Facilities and Fire Drills


Power Factor Discussion


Internet World rapidly deploying and adding UPS as afterthought. UPS=power conditioning, where Electrical Engineers would focus on building from the ground up "Power Redundancy".


Jay can talk about Trade Offs in building facilities. Building on a budget, expansion critieria.


Broader issue - how to build quality Internet Facilities into existing office space? Can it be done?


Comments/Additions welcome...


Bill

Speakers
Sean Donelan, Data Research Associates
Bill Norton, Equinix

Full Abstract

Speakers
Craig Labovitz, Merit Network

Full Abstract

Speakers
Duane Wessels, University of California, San Diego

Full Abstract

Speakers
Moderator - Bill Manning, ISI
Panelist - Steve Feldman
Panelist - John Meylor, Cisco Systems
Panelist - Christian Nielsen
Panelist - Bill Norton, Equinix
Panelist - Jeremy Porter
Panelist - Dave Siegel
Panelist - David Thomas

Recordings
Full Abstract

A review of methods for inter-provider communications and their effectiveness. Donelan reviews past problems and speculates about future trends and possible solutions. Are there methods to assure communications will work under unusual conditions? What role should ISPs play in critical infrastructure protection planning?

Speakers
Sean Donelan, Data Research Associates

Full Abstract

During the summer of 1997, when I was working on Internet peering issues at iMCI, I had a chance to help track down a couple of unusual peering activities. These involved rewriting eBGP next hops to some NAP routers; passing third party next hops; pointing default; and registering incorrect DNS names for NAP routers. Some of these activities were due to a misconfiguration, such as running IGP protocols over NAP FDDIs or turning on native IP multicasting on the NAPs.

Case #1: Rewriting eBGP Next hops At this time iMCI had two routers at MAE-East, called cpe2 and cpe3. I was informed by the NOC that the cpe2 FDDI inbound was very congested. I turned on the Netflow feature on iMCI's routers and found that 15% of the incoming traffic from cpe2 was unaccounted for. This meant that the traffic was coming from someone iMCI did not peer with at MAE-East. Further analysis showed that almost all the traffic was coming from a single subnet.

The BGP routing table showed that that subnet was 3 AS hops away from iMCI. iMCI had established private peering with ISP-1, ISP-1 peered with ISP-2, and ISP-2 peered with ISP-3, which owned the subnet. iMCI did not peer with any of these ISPs at the NAPs, so the only way ISP-3 could point next hop to iMCI was to rewrite the next hop by matching iMCI's AS number in its eBGP routes.

I placed an ACL packet filter on cpe2 to block traffic from ISP-3. Luckily. ISP-3 only had a single block of addresses, which made it possible to do packet-level filtering. ISP-3 then changed its routing to ISP-2 and ISP-1. After I removed the packet filtering, however, ISP-3 pointed its traffic back to iMCI again.

I then decided to create something even more interesting. I designed a filter to let ISP-3 pass ICMP packets, traceroute packets, and DNS packets, but block all other IP packets, just to add some complexity to their troubleshooting. The filter was there for four days; apparently they could not pinpoint the problem, and finally switched traffic back to normal.

Case #2: Passing Third-Party Next hop This case was not flagged by the Netflow analysis, because the reverse path lookup did not fail. iMCI peered with ISP-4 at MAE-East, but not with ISP-5. ISP-4 not only passed our next hop to ISP-5, but also passed ISP-5's next hop directly to us. In this way, we exchanged traffic with ISP-5 directly. We could, of course, manually overwrite all the next hops to ISP-4's address, but we still could not stop ISP-4 from passing our next hop to ISP-5. After exchanging some email with ISP-4, they agreed to fix this. Some might believe that it was more efficient to pass traffic between third parties over multi-access media. However, I believed that peering was not just an engineering issue, but also a business issue.

Case #3: Pointing Default Some routers at the exchange points simply pointed default to others. One strange example was a router at MAE-East that pointed default to UUnet; the router name had a reverse DNS lookup of xxx.internetmci.net. UUnet therefore asked me if this was one of our routers. I knew that we registered mci.net and internetmci.com, but was not sure about internetmci.net. A couple of days later, the IXP router pointed default to us instead of to UUnet. Since they claimed to be MCI, we did an SNMP query to the router, and it returned:

ip.ipRouteTable.ipRouteEntry.ipRouteNext
hop.0.0.0.0=IpAddress:192.41.177.180 +

The ip address was our cpe3's FDDI address at MAE-East. I also obtained the AS number of the router, and was able to find out who the router belonged to. After several email exchanges, the owner changed their default route so it did not point to us any more, but to someone else ;-) Somehow, they believed a router had to have a default route.
Other Activities

Other issues were debatable. I found some ISPs running IGPs over the FDDI at the NAPs. When I sent the ISPs e-mail, they told me that this was due to a misconfiguration. One could argue this was a way to create redundancy at the NAPs. But usually an ISP's routers were at the same location, or on the same rack at the NAPs, so redundancy could be achieved by using a private LAN instead of bothering other routers. One or two routers/workstations on MAE-East were using native IP multicast. I calculated that about 2Mbps of this traffic was going over the NAP FDDI, but didn't know if any router on the LAN was receiving the traffic. I talked to the senders, who told me that they planned to shut down multicast on their box.

I spent a lot of time dealing with route consistency over different peering points. On the Internet today, we use shortest exit routing. If the routes we learned from our peers were not consistent across all the peering points, then we might carry some traffic unnecessarily across our backbone. Sometime we had to avoid some peering points because of severe congestion; this was usually done through mutual agreement.

To deal with those unusual activities, we need to detect them, communicate with the right people, and, sometimes, find a way to stop them. To detect problems, I was able to use:
  • Netflow statistics for reverse route lookup
  • Traceroute with "-g" switch
  • If LSR is disabled on the other end, and you are trying to investigate how that router route certain prefixes, you can temporarily create a static route pointing to the router in question, and then trace to that address. If that router routes that trace packet back to you, then you will see the trace going back and forth between your router and the other router.
  • MAC address accounting.

Methods you can use to stop unwanted traffic include:
  • Packet level filtering if the router CPU can handle it
  • MAC address filtering/rate-limit, sometimes combined with WRED
  • Filter out routes from the network if necessary

Preventive practices include:
  • NAP GIGAswitch L2 filtering Use next-hop-self and always overwrite the next hop to your peer's address.
  • If you don't have a customer on the NAP routers, remove the non-customer routes from the NAP routers. This ensures that you only allow peers' traffic to go to your customer destinations.
  • Use the loopback address to do iBGP peering, so you don't have to carry NAP LAN address blocks over the network.
  • Use ATM PVCs

Speakers
Naiming Shen, Cisco Systems

Full Abstract

Some months ago @Home began evaluating CAR for some of the functionality that we required. We evaluated solely CAR's rate limiting capabilities, and the extent to which rate limiting impacts the network. In the process we discovered how CAR interacts with TCP, as well as the optimum configuration of burst parameters.

Speakers
Cathy Wittbrodt, @Home.

Full Abstract

Speakers
Moderator - Curtis Villamizer, ANS
Panelist - Paul Ferguson, Cisco Systems
Panelist - John Stewart, Juniper
Panelist - Jeff Wabik, Netstar/Ascend
Panelist - Hank Zannini, Avici
Speaker - Steve Willis, Argon

Full Abstract

Speakers
Craig Labovitz, Merit Network

Tuesday, November 10, 1998
Topic/Presenter
Full Abstract

Speakers
Stan Barber, Texas GigaPoP
Ron Hutchins, Southern CrossRoads, Atlanta GigaPoP
Mark Johnson, North Carolina Networking Initiative
Dave Meyer, Oregon GigaPoP

Full Abstract

Speakers
Kim Hubbard, ARIN

Recordings
Full Abstract

Speakers
Mark Kosters, InterNIC