KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Archive for 2014

 

Infographic: Big Data or Big Confusion? The Key is Open Data Services

Tuesday, July 22nd, 2014 by

When folks refer to “Big Data” these days, what is everyone really talking about? For several years now, Big Data has been THE buzzword used in conjunction with just about every technology issue imaginable. The reality, however, is that Big Data isn’t an abstract concept. Whether you like it or not, you’re already inundated with Big Data. How you source it, what insights you derive from it, and how quickly you act on it will play a major role in determining the course—and success—of your company. To help you get started understanding the key Big Data trends, take a look at this infographic: “60-Second Guide to Big Data and the Cloud.”

GoGrid_BigData_revised_300

Handling the increased volume, variety, and velocity—the “3V/s”—of data (shown in the center of the infographic) requires a fundamental shift in the makeup of the platform required to capture, store, and analyze the data. A platform that’s capable of handling and capitalizing on Big Data successfully requires a mix of structured data-handling relational databases, unstructured data-handling NoSQL databases, caching solutions, and map reducing Hadoop-style tools.

As the need for new technologies to handle the “3V/s” of Big Data has grown, open source solutions have become the catalysts for innovation, generating a steady launch of new, relevant products to tackle Big Data challenges. Thanks to the skyrocketing pace of innovation in specialized databases and applications, businesses can now choose from a variety of proprietary and open source solutions, depending on the database type and their specific database requirements.

Given the wide variety of new and complex solutions, however, it’s no surprise that a recent survey of IT professionals showed that more than 55% of Big Data projects fail to achieve their goals. The most significant challenge cited was a lack of understanding of and the ability to pilot the range of technologies on the market. This challenge systematically pushes companies toward a limited set of proprietary platforms that often reduce the choice down to a single technology. Perpetuating the tendency to seek one cure-all technology solution is no longer a realistic strategy. No single technology such as a database can solve every problem, especially when it comes to Big Data. Even if such a unique solution could serve multiple needs, successful companies are always trialing new solutions in the quest to perpetually innovate and thereby achieve (or maintain) a competitive edge.

Open Data Services and Big Data go hand-in-hand

(more…) «Infographic: Big Data or Big Confusion? The Key is Open Data Services»

Architecting for High Availability in the Cloud

Tuesday, July 22nd, 2014 by

An introduction to multi-cloud distributed application architecture

In this blog, we’ll explore how to architect a highly available (HA) distributed application in the cloud. For those new to the concept of high availability, I’m referring to the availability of the application cluster as well as the ability to failover or scale as needed. The ability to failover or scale out horizontally to meet demand ensures the application is highly available. Examples of applications that benefit from HA architectures are databases applications, file-sharing networks, social applications, health monitoring applications, and eCommerce websites. So, where do you start? The easiest way to understand the concepts is simply to walk through the 3 steps of a web application setup in the cloud.

Step 1: Setting up a distributed, fault-tolerant web application architecture

In general, the application architecture can be pretty simple: perhaps just a load-balanced web front end running on multiple servers and maybe a NoSQL database like Cassandra. When you’re developing, you can get away with a single server, but once you move into production you’ll want to snapshot your web front end and spread the application across multiple servers. This approach lets you balance traffic and scale out the web front end as needed. In GoGrid, you can do this for free using our Dynamic Load Balancers. Point and click to provision the servers as needed, and then point the load balancer(s) to those servers. The process is simple, so setting up a load-balanced web front end should only take a few minutes. Any data captured or used by the servers will of course be stored in the Cassandra cluster, which is already designed to be HA.

image

Deploying the Cassandra cluster. In GoGrid, you can use our 1-Button Deploy™ technology to set up the Cassandra cluster in about 10 minutes. This will provision the cluster for your database. Cassandra is built to be HA so if one server fails, the load is distributed across the cluster and your application isn’t impacted. Below is a sample Cassandra cluster. A minimal deployment has 3 nodes to ensure HA and the cluster is connected via the private VLAN. It’s a good idea to firewall the database servers and eliminate connectivity to the public VLAN. With our production 1-Button Deploy™ solution, the cluster is configured to include a firewall on-demand (for free). In another blog post I’ll discuss how to secure the entire environment: setting up firewalls around your database and your web application as well as working with IDS and IPS monitoring tools and DDoS mitigation services. For the moment, however, your database and web application clusters would look something like this:

image

(more…) «Architecting for High Availability in the Cloud»

How Big Data can Help Reduce Pollution

Thursday, July 17th, 2014 by

As Big Data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people. Such is potentially the case for the citizens of Beijing as one of the major projects in the field starts to take shape: an initiative to eliminate some of the city’s dangerous smog to improve the health of residents. IBM has announced that this plan will roll out over the next 10 years, with an emphasis on transforming the way air quality is analyzed.

As big data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people.

As Big Data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people.

Pollution disrupts professional routines and overall health
The pollution in Beijing has not only reduced the life expectancy of those who live in the heart of the city, but its constant presence prevents citizens from enjoying their daily lives. According to a recent piece from Quartz writer Gwynn Guilford, the Chinese government is tasked with shutting down many of the basic operations of the city, including the closure of schools and factories and limiting the number of cars that can safely drive within city limits when PM2.5 concentrations grow too high.

Here’s where the cloud infrastructure comes in. Because Big Data works best when mass amounts of information are collected and then boiled down to deliver a concise result, IBM intends to use the method to learn more about what pollutes the air around Beijing by monitoring changes in the atmosphere.

“Called ‘Green Horizon,’ the project will focus on air quality management, renewable energy management, and energy optimization among Chinese industries,” Guildford explained. “As part of the initiative, IBM has already signed a partnership with the Beijing government, which is hoping to tap into the company’s expertise to help tackle the city’s air pollution crisis.”

Cloud servers will be used to analyze current air quality in the city and identify potential solutions for alternative energy. Reuters writer David Stanway speculated that the biggest source of pollution is likely still smog from factories and cars, and that IBM can probably use the same Big Data tools that identified the problem to find effective solutions. Possible long-term projects might include solar- and wind-powered installations within the city to reduce energy consumption.

(more…) «How Big Data can Help Reduce Pollution»

Is MapReduce Dead?

Tuesday, July 15th, 2014 by

With the recent announcement by Google of Cloud DataFlow (intended as the successor to MapReduce) and with Cloudera now focusing on Spark for many of its projects, it looks like the days of MapReduce may be numbered. Although the change may seem sudden, it’s been a long time coming. Google wrote the MapReduce white paper 10 years ago, and developers have been using at least one distribution of Hadoop for about 8 years. Users have had ample time to determine the strengths and weaknesses of MapReduce. However, the release of Hadoop 2.0 and YARN clearly indicated that users wanted to live in a more diverse Big Data world.

spark-logo

Earlier versions of Hadoop could be described as MapReduce + HDFS (Hadoop Distributed File System) because that was the paradigm that everything Hadoop revolved around. Because users clamored for interactive access to Hadoop data, the Hive and Pig projects were started. And even though you could write SQL queries with Hive and script in Pig Latin with Pig, under the covers Hadoop was still running MapReduce jobs. That all changed in Hadoop 2.0 with the introduction of YARN. YARN became the resource manager for a Hadoop cluster that broke the dependence between MapReduce and HDFS. Although HDFS still remained as the file system, MapReduce became just another application that can interface with Hadoop through YARN. This change made it possible for other applications to now run on Hadoop through YARN.

Google is not known as a backer in the mold of Hortonworks or Cloudera with the open source Hadoop ecosystem. After all, Google was running its own versions of MapReduce and HDFS (the Google File System) on which these open-source projects are based. Because they are integral parts of Google’s internal applications, Google has the most experience with using these technologies. And although Cloud DataFlow is specifically for use on the Google cloud and appears more like a competitor to Amazon’s Kinesis product, Google is very influential in Big Data circles, so I can see other developers following Google’s lead and leveraging a similar technology in favor of MapReduce.

Although Google’s Cloud DataFlow may have a thought leadership-type impact, Cloudera’s decision to leverage Spark as the standard processing engine for its projects (in particular, Hive) will have a greater impact on open-source Big Data developers. Cloudera has one of the most popular Hadoop distributions on the market and has partnered with Databricks, Intel, MapR, and IBM to work on their Spark integration with Hive. This trend is surprising given Cloudera’s investment in Impala (its SQL query engine), but the company clearly feels that Spark is the future. As little as a year ago, Spark was mostly seen as fast in-memory computing for machine learning algorithms. However with its promotion to an Apache Top-Level Project in February 2014 and its backing company Databricks receiving $33 million in Series B funding, Spark clearly has greater ambitions. The advent of YARN made it much easier to tie Spark to the growing Hadoop ecosystem. Cloudera’s decision to leverage Spark in Hive and other projects makes it even more important to users of the CDH distribution.

spark-stack

(more…) «Is MapReduce Dead?»

Big Data Revolutionizes the Gaming Industry

Thursday, July 10th, 2014 by

There are few technologies that promise to improve as many different industries as Big Data. Whether it’s medicine or the weather in your backyard, the mass aggregation and analyzing of information could result in marked improvement and insight on nearly anything. The cloud computing technology may even change the way we have fun. It has already had an impressive effect on the video gaming industry and will have a great deal of influence on determining what the runaway hits of tomorrow will be. Here’s are a few small but important insights into the world of the gamer.

There are few technologies out there that can stand to improve so many different industries as big data can.

There are few technologies that promise to improve as many different industries as Big Data.

Big Data observes the user learn the game
The emerging technology offers those marketing and developing video game developers more insight than ever into what makes players tick, what makes them happy, and what keeps them engaged. Any game’s success is directly connected to the “addiction” factor – what is it about a certain game that makes users feel they can’t stop playing, and even more important, how can that feeling be monetized? To study this objective further, each activity must be stripped down to individual characters, levels, and other gameplay features to determine what works and what doesn’t.

Qubole writer Gil Allouche wrote a piece on how Big Data can be used to decide how difficult individual levels should be in future incarnations of any given game. A cloud server can track how long it takes each player to finish a level, indicating whether early levels are too simple and need to be beefed up in difficulty or are discouraging new users because they’re too challenging. Mass amounts of data can help narrow down the right decision for an individual game.

Increasing sales on cloud-based consoles
For nearly all current gaming systems, the Internet and cloud hosting have integrated seamlessly to foster more sales as well as engagement between players on massively popular interactive games. By basing the gaming store online with the ability to be accessed on the console itself, gamers are saved a trip to the store and can download a new experience right to their system in real time, giving them less time to question a decision and dive right into a purchase. Big Data also allows companies to better “recommend” similar games and products to the ones a gamer is already enjoying, increasing the likelihood of sealing a sale.

Real-world examples
EA Games is one of the largest video game developers and distributors on the planet, and announced a new commitment to improving its business model and products with the help of Big Data earlier this year. This will give the company a huge technological advantage, especially when it comes to targeting advertising and maximizing player-to-player interaction in major gaming successes like Activision’s “Call of Duty” franchise and EA’s own “Battlefield” franchise. Silicon Angle, a science and technology blog, reported on the gaming company’s major statement.

(more…) «Big Data Revolutionizes the Gaming Industry»