August was always an exciting time at my childhood home.Â We were excited that was school was starting in September and mom was relieved that summer was coming to an end. I remember the annual trips to the local department stores to buy school clothes every year.Â It was always exciting to pick out a new school clothing and a new winter coat. With only a few stores to choose from, many of us wore similar clothes and coats when classes started.
As consumers, we have far more fashion and store options today. There are specialty stores at the mall, big box outlets, membership stores and specialty online portals. With so many more clothing designers than in years past, retailers are also inundated with fashion choices. The question becomes, â€śhow does the fashion chain â€“ from textile suppliers and clothing manufacturers to the retailers themselves â€“ choose what to carry?â€ť Â
They all rely on big data to make critical decisions.Â Letâ€™s go to the start of the chain: the textile manufacturer. It may analyze previous yearsâ€™ orders, competitive intelligence, purchasing trend data, and raw material and manufacturing costs.Â While tracking analytics on one data source is relatively easy, capturing and analyzing multiple data sources can be a tremendous challenge â€“ a point underscored in a 2012 research report from Gartner.Â In its analysis, Gartner found that big data processing challenges donâ€™t come from analysis or a single data set or source but rather from the complexity of interaction between two or more data sets.
â€śWhen combining large assets and new asset types, how they relate to each other becomes more complex,â€ť the Gartner report explains. â€śAs more assets are combined, the tempo of record creation and the qualification of the data within use cases becomes more complex.â€ť
The next link is the clothing companies that create the fashion. They have a much more complex job, using big data to analyze fashion trends and improve their decision-making.Â Information such as historical sales, weather predictions, demographic data and economic details help them chose the right colors, sizes and price points for the clothing they make.Â Â Â
Swim Suits and Snow Parkas
This is where we, as consumers, come into the picture.Â Just as I did many years ago, people still shop for school and winter clothing this time of year.Â The clothes on the racks at our favorite retailer or from an online catalogue were chosen and ordered 6-9 months ago.Â Take Kohlâ€™s. The nationwide retailer uses a blend of geographic weather prediction data sources to know where to best sell those snow parkas versus swim suits, economic and competitive data to price it right, demographic data sources to better predict the required sizes and customer demand, and market trends data sources to better forecast the colors and styles that will sell best.Â The more accurately Kohlâ€™s buyers can predict consumer behavior using big data, the less the retailer will need to discount overstock, and the higher its sales and profit.Â
As I stated in my previous blog posts, the HadoopÂ® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. As with flu strain and weather predictions, the more data you can quickly and efficiently analyze, the more accurate your prediction. When it comes to weather and flu vaccines, these predictions can help save lives, but in the fashion industry it is all about improving the bottom line.
Whether in fashion, medical, weather or other fields , the use of Hadoop for high levels of speed and accuracy in big data analysis requires computers with application acceleration. One such tool is LSIÂ® Nytroâ„˘ Application Acceleration. You can go to TheSmarterWayToFasterâ„˘ for more information on the Nytro product family.
Part three of this three-part series continues to examine some of the diverse and potentially life-saving uses of big data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize big data application performance.
Every year I diligently get in line for my annual flu (or more technically accurate â€śseasonal influenzaâ€ť) shot.Â Iâ€™m not particularly fond of needles, but I have seen what the flu can do and the how many die each year from this seasonal virus.
When you get the flu shot â€“ or, now, the nasal mist â€“ you and I are trusting a lot of people that what you are taking will actually help protect you. According to the CDC (Centers for Disease Control and Prevention), there are 3 three strains, (A, B &C Antigenic) of influenza virus and of those three types, two cause the seasonal epidemics we suffer through each year.
Not to get too technical, but I learned that the A strain is further segregated by 2 proteins and are given code names like H1N1, H3N2 and H5N1. They can even be updated by year if there is a change in them.Â An example of this was in 2009, when the H1N1 became the 2009 H1N1. Â So where we may just call it H1N1, the World Health Organization has a whole taxonomy to describe a seasonal influenza strain.
This taxonomy includes:
As you can see, it can really get complicated quickly. If you would like to go deeper, you can read more about this here. While much of this information seems pretty arcane to the lay reader, you quickly can see that the sheer volume of information collected, stored and analyzed to combat seasonal influenza is a great example of big data.
In the US, once the CDC sifts through this data â€“ using big data analytics tools â€“ it uses its findings to determine what strains might affect the US and build a flu shot to combat those strains.Â During the 2012/2013 season, the predominant virus was Influenza A (H3N2), though some influenza B viruses contained a dash of influenza A (H1N1) pdm09 (pH1N1). (See the full report here.)
In addition to identifying dominant viruses, the CDC also uses big data to track the spread and potential effect on the population.Â Reviewing information from prior outbreaks, population data, and even weather patterns, the CDC uses big data analytics to quickly estimate and attempt to determine where viruses might hit first, hardest and longest so that a targeted vaccine can be produced in sufficient quantities, in the required timeframe and even for the right geography.Â The faster and more accurately this can be done, the more people can get this potentially life saving vaccine before the virus travels to their area.
As I stated in my previous blog post, the HadoopÂ® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. As with weather predictions, the more data you can quickly and efficiently analyze, the greater the likelihood of an accurate prediction. When it comes to weather and flu vaccines, these predictions can help save lives. In my final blog post in this series, I will explore how big data helps the fashion industry.
Whether in medical, weather or other fields that leverage big data technologies, the use of Hadoop for high levels of speed and accuracy in big data analysis requires computers with application acceleration. One such tool is LSIÂ® Nytroâ„˘ Application Acceleration. You can go to TheSmarterWayToFasterâ„˘ for more information on the Nytro product family.
Part two of this three-part series continues to examine some of the diverse and potentially life-saving uses of big data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize big data application performance.
We all watch the local weather and wonder how forecasters predict (or in some cases mis-predict) the future of weather.Â While they may not all agree on the forecast, they do agree that the more current and historical data you have, the better your ability to predict what might happen over the next hours, days and weeks.
A term used to describe this growing amount of information is Big Data, and more and more of it leverages Hadoop, a flexible architecture that provides the analysis tools and scalability required to comb through and utilize all available data.Â When recently talking to a US-based meteorologist (the technical name for a degreed weather forecaster), I learned that meteorologists rely on many different weather models from various sources to help create their forecasts.
Weather spawns downpour of Big Data
These models collect massive amounts of weather information from around the world. Using this information, computers then run billions of calculations to mimic the motion of weather patterns in the Earthâ€™s dynamic atmosphere and produce forecasts for any given location over time. It was interesting to learn that not all weather models are equal.
While weather modeling websites worldwide collect this atmospheric data and provide it to meteorologists, the European community is seen as having the most accurate information.Â When I asked why, I learned that European weather modeling sites have some of the fastest computer hardware and technology, enabling them to analyze more data faster, which produces better overall forecasts. The US weather professional I spoke with tends to use these European sites as part of his analysis, and when European models conflict with those from US sites, he often leans toward the European data.
His use of the European weather modeling sites points to the value of fast, accurate analysis of Big Data. It also underscores the implications of vast amounts of data overwhelming the ability of the compute and storage resources available to process it. An accurate and timely weather forecast is critical and a bad or missed forecast can have terrible and even deadly consequences.
A case in point: Hurricane Sandy
In this article on Hurricane Sandy forecast speed and accuracy, you can see how removing just one source of data can dramatically reduce the accuracy of predicting a critical event such as where a hurricane will make landfall. To be sure, the more data you can store and the faster you can process it for analysis, the greater your potential competitive advantage, even in the vaunted halls of meteorological analysis and prediction.
The HadoopÂ® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. This gets interesting as you think about and explore the ripple effect of accurate or inaccurate forecasting in many areas. In my next blog post I will explore one of those â€“ flu vaccines.
Whether in meteorology or other fields that leverage Big Data technologies, the use of Hadoop for high levels of speed and accuracy in Big Data analysis requires computers with application acceleration. One such tool is LSIÂ® Nytroâ„˘ Application Acceleration. You can go to TheSmarterWayToFasterâ„˘ for more information on the Nytro product family.
This three-part series examines some of the diverse uses of Big Data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize Big Data application performance.
Tags: application accleration, big data, European weather modeling, flash, flash storage, Hadoop, Hurricane Sandy, meterology, Nytro, processing performance, storage performance, weather modeling
Big data and Hadoop are all about exploiting new value and opportunities with data. In financial trading, business and some areas of science, itâ€™s all about being fastest or first to take advantage of the data. The bigger the data sets, the smarter the analytics. The next competitive edge with big data comes when you layer in flash acceleration. The challenge is scaling performance in Hadoop clusters.
The most cost-effective option emerging for breaking through disk-to-I/O bottlenecks to scale performance is to use high-performance read/write flash cache acceleration cards for caching. This is essentially a way to get more work for less cost, by bringing data closer to the processing. The LSIÂ® Nytroâ„˘ product has been shown during testing to improve the time it takes to complete Hadoop software framework jobs up to a 33%.
Combining flash cache acceleration cards with Hadoop software is a big opportunity for end users and suppliers. LSI estimates that less than 10% of Hadoop software installations today incorporate flash acceleration1. Â This will grow rapidly as companies see the increased productivity and ROI of flash to accelerate their systems.Â And use of Hadoop software is also growing fast. IDC predicts a CAGR of as much as 60% by 20162. Drivers include IT security, e-commerce, fraud detection and mobile data user management. Gartner predicts that Hadoop software will be in two-thirds of advanced analytics products by 20153. There are many thousands of Hadoop software clusters already employed.
Where flash makes the most immediate sense is with those who have smaller clusters doing lots of in-place batch processing. Hadoop is purpose-built for analyzing a variety of data, whether structured, semi-structured or unstructured, without the need to define a schema or otherwise anticipate results in advance. Hadoop enables scaling that allows an unprecedented volume of data to be analyzed quickly and cost-effectively on clusters of commodity servers. Speed gains are about data proximity. This is why flash cache acceleration typically delivers the highest performance gains when the card is placed directly in the server on the PCI ExpressÂ® (PCIe) bus.
PCIe flash cache cards are now available with multiple terabytes of NAND flash storage, which substantially increases the hit rate. We offer a solution with both onboard flash modules and Serial-Attached SCSI (SAS) interfaces to create high-performance direct-attached storage (DAS) configurations consisting of solid state and hard disk drive storage. This couples the low latency performance benefits of flash with the capacity and cost per gigabyte advantages of HDDs.
To keep the processor close to the data, Hadoop uses servers with DAS. And to get the data even closer to the processor, the servers are usually equipped with significant amounts of random access memory (RAM). An additional benefit, smart implementation of Hadoop and flash components can reduce the overall server footprint required. Scaling is simplified, with some solutions providing the ability to allow up to 128 devices which share a very high bandwidth interface. Most commodity servers provide 8 or less SATA ports for disks, reducing expandability.
Hadoop is great, but flash-accelerated Hadoop is best. Itâ€™s an effective way, as you work to extract full value from big data, to secure a competitive edge.