When implementing an LSI Nytro WarpDrive (NWD) or Nytro MegaRAID (NMR) PCIe flash card in a Linux server, you need to modify quite a few variables to get the best performance out of these cards.

In the Linux server, device assignments sometimes change after reboots. Sometimes, the PCIe flash card can be assigned /dev/sda. Other times, it can be assigned /dev/sdd, or any device name. This variability can wreak havoc when modifying the Linux environment variables. To get around this issue, assignments by the SCSI address should be used so all of the Linux performance variables will persist properly across reboots. If using a filesystem, use the device UUID address in the mount statement in /etc/fstab to persist the mount command across reboots.

Cut and paste the script
The first step to solving is to cut and paste the following script, except the SCSI address (highlighted in yellow), into /etc/rc.local. You’ll need to enter the SCSI address of the PCIe card before executing the script.

nwd_getdevice.sh
ls -al /dev/disk/by-id |grep 'scsi-3600508e07e726177965e06849461a804 ' |grep /sd > nwddevice.txt
awk '{split($11,arr,"/"); print arr[3]}' nwddevice.txt > nwd1device.txt
variable1=$(cat nwd1device.txt)
echo "4096" > /sys/block/$variable1/queue/nr_requests
echo "512" > /sys/block/$variable1/device/queue_depth
echo "deadline" > /sys/block/$variable1/queue/scheduler
echo "2" > /sys/block/$variable1/queue/rq_affinity
echo 0 > /sys/block/$variable1/queue/rotational
echo 0 > /sys/block/$variable1/queue/add_random
echo 1024 > /sys/block/$variable1/queue/max_sectors_kb
echo 0 > /sys/block/$variable1/queue/nomerges
blockdev --setra 0 /dev/$variable1

The highlighted SCSI address above needs to be modified with the SCSI address of the PCIe flash card. To get the address, issue this command:

ls –al /dev/disk/by-id

When you install the Nytro PCIe flash card, Linux will assign a name to the device. For example, the device name can be listed as /dev/sdX, and X can be any letter. The output from the ls command above will show the SCSI address for this PCIe device. Don’t use the address containing “-partX” in it. Be sure to note this SCSI address since you will need it to create the script below. Include a single space between the SCSI address and the closing single quote in the script.

Create nwd_getdevice.sh file
Next, copy the code and create a file called “nwd_getdevice.sh” with the modified SCSI address.

After saving this file, change file permissions to “execute” and then place this command in the /etc/rc.local file:

/path/nwd_getdevice.sh

Test the script
To test this script, execute it on the command line exactly how you stated it in the rc.local file. The next time the system reboots, the settings will be set to the appropriate device.

Multiple PCIe flash cards
If you plan to deploy multiple LSI PCIe flash cards in the server, the easiest way is to duplicate all of commands in the nwd_getdevice.sh script and paste them at the end. Then change the SCSI address of the next card and overlay the SCSI address in the newly pasted area. You can follow this procedure for as many LSI PCIe flash cards as are installed in the server. For example:

nwd_getdevice.sh
ls -al /dev/disk/by-id |grep 'scsi-1stscsiaddr83333365e06849461a804 ' |grep /sd > nwddevice.txt
awk '{split($11,arr,"/"); print arr[3]}' nwddevice.txt > nwd1device.txt
variable1=$(cat nwd1device.txt)
echo "4096" > /sys/block/$variable1/queue/nr_requests
echo "512" > /sys/block/$variable1/device/queue_depth
echo "deadline" > /sys/block/$variable1/queue/scheduler
echo "2" > /sys/block/$variable1/queue/rq_affinity
echo 0 > /sys/block/$variable1/queue/rotational
echo 0 > /sys/block/$variable1/queue/add_random
echo 1024 > /sys/block/$variable1/queue/max_sectors_kb
echo 0 > /sys/block/$variable1/queue/nomerges
blockdev --setra 0 /dev/$variable1
ls -al /dev/disk/by-id |grep 'scsi-2ndscsiaddr1234566666654444444444 ' |grep /sd > nwddevice.txt
awk '{split($11,arr,"/"); print arr[3]}' nwddevice.txt > nwd1device.txt
variable1=$(cat nwd1device.txt)
echo "4096" > /sys/block/$variable1/queue/nr_requests
echo "512" > /sys/block/$variable1/device/queue_depth
echo "deadline" > /sys/block/$variable1/queue/scheduler
echo "2" > /sys/block/$variable1/queue/rq_affinity
echo 0 > /sys/block/$variable1/queue/rotational
echo 0 > /sys/block/$variable1/queue/add_random
echo 1024 > /sys/block/$variable1/queue/max_sectors_kb
echo 0 > /sys/block/$variable1/queue/nomerges
blockdev --setra 0 /dev/$variable1

Final thoughts
The most important step in implementing Nytro PCIe flash cards under Linux is aligning the card on a boundary, which I cover in Part 1 of this series.  This step alone can deliver a 3x performance gain or more based on our in-house tests as well as testing from some of our customers. The rest of this series walks you through the process of setting up these aligned flash cards using a file system, ASM or RAW device and, finally, persisting all the Linux performance variables to the card so these settings are persisted across reboots.

Links to the other posts in this series:

How to maximize PCIe flash performance under Linux

Part 1: Aligning PCIe flash devices
Part 2: Creating the RAW device or filesystem
Part 3: Oracle ASM

Tags: , , , , , , , , , , ,
Views: (1727)


LSI’s Accelerating Innovation Summit in San Jose has given me a sneak peak of some solutions our partners are putting together to solve datacenter challenges. Such is the case with EMC’s ScaleIO business unit (EMC recently acquired ScaleIO), which has rolled out some nifty software that helps streamline VDI (Virtual Desktop Infrastructure) scaling.

As I shared in a previous blog, VDI deployments are growing like gangbusters. It’s easy to see why. The manageability and security benefits of virtualized desktop environment are tough to beat.  Deploying and supporting hundreds of desktops as VDI instances on a single server lets you centralize desktop management and security.  Another advantage is that patches, security updates, and hardware and software upgrades demand much less overhead. VDI also dramatically reduces the risk that desktop users will breach security by making it easier to prevent data from being copied onto portable media or sent externally.

Mass boots drag down VDI performance
But as with all new technologies, a number of performance challenges can crop up when you move to a virtual world.  In enterprise-scale deployments, VDI performance can suffer when the IT administrator attempts to boot all those desktops Monday morning or reboot after Patch Tuesday.  What’s more, VDI performance can drop significantly when users all log in in at the same time each morning. In addition, virtualized environments sometimes are unfriendly to slews of users trying to access files simultaneously, making them wait because of the heavy traffic load. One bottleneck often is legacy SAN-connected storage since file access requests are queued through a single storage controller.  And of course increasing the density of virtual desktops supported by a server can exacerbate the whole performance problem.

VDI’s are ripe for distributed storage, and the ScaleIO ECS (Elastic Converged Storage) software is a compelling solution, incorporating an elastic storage infrastructure that scales both capacity and performance with changing business requirements. The software pools local direct attached storage (DAS) on each server into a large storage repository. If desktops are moved between physical servers, or if a server fails, the datacenter’s existing high-speed network moves data to the local storage of the new server.

LSI Nytro and ScaleIO ECS software boost VDI session number, reduce costs
In an AIS demonstration, the ScaleIO ECS software leverages the application acceleration of the LSI® Nytro™ MegaRAID® card to significantly increase the number of VDI sessions the VDI server could support, reducing the cost of each VDI session by up to 33%. Better yet, application acceleration gives users shorter response times than they see on their laptops. By using the ScaleIO ECS software and Nytro MegaRAID card, customers get the benefits high-availability storage and intelligent flash acceleration at a more budget-friendly price point than comparable SAN-based solutions.

 

Tags: , , , , , , , ,
Views: (640)


Optimizing the work per dollar spent is a high priority in datacenters around the world. But there aren’t many ways to accomplish that. I’d argue that integrating flash into the storage system drives the best – sometimes most profound – improvement in the cost of getting work done.

Yea, I know work/$ is a US-centric metric, but replace the $ with your favorite currency. The principle remains the same.

I had the chance to talk with one of the execs who’s responsible for Google’s infrastructure last week. He talked about how his fundamental job was improving performance/$. I asked about that, and he explained “performance” as how much work an application could get done. I asked if work/$ at the application was the same, and he agreed – yes – pretty much.

You remember as a kid that you brought along a big brother as authoritative backup? OK – so my big brother Google and I agree – you should be trying to optimize your work/$. Why? Well – it could be to spend less, or to do more with the same spend, or do things you could never do before, or simply to cope with the non-linear expansion in IT demands even as budgets are shrinking. Hey – that’s the definition of improving work/$… (And as a bonus, if you do it right, you’ll have a positive green impact that is bound to be worth brownie points.)

Here’s the point. Processors are no longer scaling the same – sure, there are more threads, but not all applications can use all those threads. Systems are becoming harder to balance for efficiency. And often storage is the bottleneck. Especially for any application built on a database. So sure – you can get 5% or 10% gain, or even in the extreme 100% gain in application work done by a server if you’re willing to pay enough and upgrade all aspects of the server: processors, memory, network… But it’s almost impossible to increase the work of a server or application by 200%, 300% or 400% – for any money.

I’m going to explain how and why you can do that, and what you get back in work/$. So much back that you’ll probably be spending less and getting more done. And I’m going to explain how even for the risk-averse, you can avoid risk and get the improvements.

More work/$ from general-purpose DAS servers and large databases
Let me start with a customer. It’s a bank, and it likes databases. A lot. And it likes large databases even more. So much so that it needs disks to hold the entire database. Using an early version of an LSI Nytro™ MegaRAID® card, it got 6x the work from the same individual node and database license. You can read that as 600% if you want. It’s big. To be fair – that early version had much more flash than our current products, and was much more expensive. Our current products give much closer to 3x-4x improvement. Again, you can think of that as 300%-400%. Again, slap a Nytro MegaRAID into your server and it’s going to do the work of 3 to 4 servers. I just did a web search and, depending on configuration, Nytro MegaRAIDs are $1,800 to $2,800 online. I don’t know about you, but I would have a hard time buying 2 to 3 configured servers + software licenses for that little, but that’s the net effect of this solution. It’s not about faster (although you get that). It’s about getting more work/$.

But you also want to feel safe – that you’re absolutely minimizing risk. OK. Nytro MegaRAID is a MegaRAID card. That’s overwhelmingly the most common RAID controller in the world, and it’s used by 9 of the top 10 OEMs, and protects 10’s to 100‘s of millions of disks every day. The Nytro version adds private flash caching in the card and stores hot reads and writes there. Writes to the cache use a RAID 1 pair. So if a flash module dies, you’re protected. If the flash blocks or chip die wear out, the bad blocks are removed from the cache pool, and the cache shrinks by that much, but everything keeps operating – it’s not like a normal LUN that can’t change size. What’s more, flash blocks usually finally wear out during the erase cycle – so no data is lost.  And as a bonus, you can eliminate the traditional battery most RAID cards use – the embedded flash covers that – so no more annual battery service needed. This is a solution that will continue to improve work/$ for years and years, all the while getting 3x-4x the work from that server.

More work/$ from SAN-attached servers (without actually touching the SAN)
That example was great – but you don’t use DAS systems. Instead, you use a big iron SAN. (OK, not all SANs are big iron, but I like the sound of that expression.) There are a few ways to improve the work from servers attached to SANs. The easiest of course is to upgrade the SAN head, usually with a flash-based cache in the SAN controller. This works, and sometimes is “good enough” to cover needs for a year or two. However, the server still needs to reach across the SAN to access data, and it’s still forced to interact with other servers’ IO streams in deeper queues. That puts a hard limit on the possible gains. 

Nytro XD caches hot data in the server. It works with virtual machines. It intercepts storage traffic at the block layer – the same place LSI’s drivers have always been. If the data isn’t hot, and isn’t cached, it simply passes the traffic through to the SAN. I say this so you understand – it doesn’t actually touch the SAN. No risk there. More importantly, the hot storage traffic never has to be squeezed through the SAN fabric, and it doesn’t get queued in the SAN head. In other words, it makes the storage really, really fast.

We’ve typically found work from a server can increase 5x to 10x, and that’s been verified by independent reviewers. What’s more, the Nytro XD solution only costs around 4x the price of a high-end SAN NIC. It’s not cheap, but it’s way cheaper than upgrading your SAN arrays, it’s way cheaper than buying more servers, and it’s proven to enable you to get far more work from your existing infrastructure. When you need to get more work – way more work – from your SAN, this is a really cost-effective approach. Seriously – how else would you get 5x-10x more work from your existing servers and software licenses?

More work/$ from databases
A lot of hyperscale datacenters are built around databases of a finite size. That may be 1, 2 or even 4 TBytes. If you use Apple’s online services for iTunes or iCloud, or if you use Facebook, you’re using this kind of infrastructure.

If your datacenter has a database that can fit within a few TBytes (or less), you can use the same approach. Move the entire LUN into a Nytro WarpDrive® card, and you will get 10x the work from your server and database software. It makes such a difference that some architects argue Facebook and Apple cloud services would never have been possible without this type of solution. I don’t know, but they’re probably right. You can buy a Nytro WarpDrive for as little as a low-end server. I mean low end. But it will give you the work of 10. If you have a fixed-size database, you owe it to yourself to look into this one.

More work/$ from virtualized and VDI (Virtual Desktop) systems
Virtual machines are installed on a lot of servers, for very good reason. They help improve the work/$ in the datacenter by reducing the number of servers needed and thereby reducing management, maintenance and power costs. But what if they could be made even more efficient?

Wall Street banks have benchmarked virtual desktops. They found that Nytro products drive these results: support of 2x the virtual desktops, 33% improvement in boot time during boot storms, and 33% lower cost per virtual desktop. In a more general application mix, Nytro increases work per server 2x-4x.  And it also gives 2x performance for virtual storage appliances.

While that’s not as great as 10x the work, it’s still a real work/$ value that’s hard to ignore. And it’s the same reliable MegaRAID infrastructure that’s the backbone of enterprise DAS storage.

A real example from our own datacenter
Finally – a great example of getting far more work/$ was an experiment our CIO Bruce Decock did. We use a lot of servers to fuel our chip-design business. We tape out a lot of very big leading-edge process chips every year. Hundreds.  And that takes an unbelievable amount of processing to get what we call “design closure” – that is, a workable chip that will meet performance requirements and yield. We use a tool called PrimeTime that figures out timing for every signal on the chip across different silicon process points and operating conditions. There are 10’s to 100’s of millions of signals. And we run every active design – 10’s to 100’s of chips – each night so we can see how close we’re getting, and we make multiple runs per chip. That’s a lot of computation… The thing is, electronic CAD has been designed to try not to use storage or it will never finish – just /tmp space, but CAD does use huge amounts of memory for the data structures, and that means swap space on the order of TBytes. These CAD tools usually don’t need to run faster. They run overnight and results are ready when the engineers come in the next day. These are impressive machines: 384G or 768G of DRAM and 32 threads.  How do you improve work/$ in that situation? What did Bruce do?

He put LSI Nytro WarpDrives in the servers and pointed /tmp at the WarpDrives. Yep. Pretty complex. I don’t think he even had to install new drivers. The drivers are already in the latest OS distributions. Anyway – like I said – complex.

The result? WarpDrive allowed the machines to fully use the CPU and memory with no I/O contention. With WarpDrive, the PrimeTime jobs for static timing closure of a typical design could be done on 15 vs. 40 machines. That’s each Nytro node doing 260% of the work vs. a normal node and license. Remember – those are expensive machines (have you priced 768G of DRAM and do you know how much specialized electronic design CAD licenses are?) So the point wasn’t to execute faster. That’s not necessary. The point is to use fewer servers to do the work. In this case we could do 11 runs per server per night instead of just 4. A single chip design needs more than 150 runs in one night.

To be clear, the Nytro WarpDrives are a lot less expensive than the servers they displace. And the savings go beyond that – less power and cooling. Lower maintenance. Less admin time and overhead. Fewer Licenses.  That’s definitely improved work/$ for years to come. Those Nytro cards are part of our standard flow, and they should probably be part of every chip company’s design flow.

So you can improve work/$ no matter the application, no matter your storage model, and no matter how risk-averse you are.

Optimizing the work per dollar spent is a high – maybe the highest – priority in datacenters around the world. And just to be clear – Google agrees with me. There aren’t many ways to accomplish that improvement, and almost no ways to dramatically improve it. I’d argue that integrating flash into the storage system is the best – sometimes most profound – improvement in the cost of getting work done. Not so much the performance, but the actual work done for the money spent. And it ripples through the datacenter, from original CapEx, to licenses, maintenance, admin overhead, power and cooling, and floor space for years. That’s a pretty good deal. You should look into it.

For those of you who are interested, I already wrote about flash in these posts:
What are the driving forces behind going diskless?
LSI is green – no foolin’

 

Tags: , , , , , , , , , , , , , , , , , ,
Views: (1507)