KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Architecting for High Availability in the Cloud

July 22nd, 2014 by - 1,523 views

An introduction to multi-cloud distributed application architecture

In this blog, we’ll explore how to architect a highly available (HA) distributed application in the cloud. For those new to the concept of high availability, I’m referring to the availability of the application cluster as well as the ability to failover or scale as needed. The ability to failover or scale out horizontally to meet demand ensures the application is highly available. Examples of applications that benefit from HA architectures are databases applications, file-sharing networks, social applications, health monitoring applications, and eCommerce websites. So, where do you start? The easiest way to understand the concepts is simply to walk through the 3 steps of a web application setup in the cloud.

Step 1: Setting up a distributed, fault-tolerant web application architecture

In general, the application architecture can be pretty simple: perhaps just a load-balanced web front end running on multiple servers and maybe a NoSQL database like Cassandra. When you’re developing, you can get away with a single server, but once you move into production you’ll want to snapshot your web front end and spread the application across multiple servers. This approach lets you balance traffic and scale out the web front end as needed. In GoGrid, you can do this for free using our Dynamic Load Balancers. Point and click to provision the servers as needed, and then point the load balancer(s) to those servers. The process is simple, so setting up a load-balanced web front end should only take a few minutes. Any data captured or used by the servers will of course be stored in the Cassandra cluster, which is already designed to be HA.

image

Deploying the Cassandra cluster. In GoGrid, you can use our 1-Button Deploy™ technology to set up the Cassandra cluster in about 10 minutes. This will provision the cluster for your database. Cassandra is built to be HA so if one server fails, the load is distributed across the cluster and your application isn’t impacted. Below is a sample Cassandra cluster. A minimal deployment has 3 nodes to ensure HA and the cluster is connected via the private VLAN. It’s a good idea to firewall the database servers and eliminate connectivity to the public VLAN. With our production 1-Button Deploy™ solution, the cluster is configured to include a firewall on-demand (for free). In another blog post I’ll discuss how to secure the entire environment: setting up firewalls around your database and your web application as well as working with IDS and IPS monitoring tools and DDoS mitigation services. For the moment, however, your database and web application clusters would look something like this:

image

Read the rest of this entry » «Architecting for High Availability in the Cloud»

How Big Data can Help Reduce Pollution

July 17th, 2014 by - 2,686 views

As Big Data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people. Such is potentially the case for the citizens of Beijing as one of the major projects in the field starts to take shape: an initiative to eliminate some of the city’s dangerous smog to improve the health of residents. IBM has announced that this plan will roll out over the next 10 years, with an emphasis on transforming the way air quality is analyzed.

As big data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people.

As Big Data continues to become a part of our everyday lives, new uses for the technology emerge that stand to improve the quality of life for millions of people.

Pollution disrupts professional routines and overall health
The pollution in Beijing has not only reduced the life expectancy of those who live in the heart of the city, but its constant presence prevents citizens from enjoying their daily lives. According to a recent piece from Quartz writer Gwynn Guilford, the Chinese government is tasked with shutting down many of the basic operations of the city, including the closure of schools and factories and limiting the number of cars that can safely drive within city limits when PM2.5 concentrations grow too high.

Here’s where the cloud infrastructure comes in. Because Big Data works best when mass amounts of information are collected and then boiled down to deliver a concise result, IBM intends to use the method to learn more about what pollutes the air around Beijing by monitoring changes in the atmosphere.

“Called ‘Green Horizon,’ the project will focus on air quality management, renewable energy management, and energy optimization among Chinese industries,” Guildford explained. “As part of the initiative, IBM has already signed a partnership with the Beijing government, which is hoping to tap into the company’s expertise to help tackle the city’s air pollution crisis.”

Cloud servers will be used to analyze current air quality in the city and identify potential solutions for alternative energy. Reuters writer David Stanway speculated that the biggest source of pollution is likely still smog from factories and cars, and that IBM can probably use the same Big Data tools that identified the problem to find effective solutions. Possible long-term projects might include solar- and wind-powered installations within the city to reduce energy consumption.

Read the rest of this entry » «How Big Data can Help Reduce Pollution»

Is MapReduce Dead?

July 15th, 2014 by - 2,412 views

With the recent announcement by Google of Cloud DataFlow (intended as the successor to MapReduce) and with Cloudera now focusing on Spark for many of its projects, it looks like the days of MapReduce may be numbered. Although the change may seem sudden, it’s been a long time coming. Google wrote the MapReduce white paper 10 years ago, and developers have been using at least one distribution of Hadoop for about 8 years. Users have had ample time to determine the strengths and weaknesses of MapReduce. However, the release of Hadoop 2.0 and YARN clearly indicated that users wanted to live in a more diverse Big Data world.

spark-logo

Earlier versions of Hadoop could be described as MapReduce + HDFS (Hadoop Distributed File System) because that was the paradigm that everything Hadoop revolved around. Because users clamored for interactive access to Hadoop data, the Hive and Pig projects were started. And even though you could write SQL queries with Hive and script in Pig Latin with Pig, under the covers Hadoop was still running MapReduce jobs. That all changed in Hadoop 2.0 with the introduction of YARN. YARN became the resource manager for a Hadoop cluster that broke the dependence between MapReduce and HDFS. Although HDFS still remained as the file system, MapReduce became just another application that can interface with Hadoop through YARN. This change made it possible for other applications to now run on Hadoop through YARN.

Google is not known as a backer in the mold of Hortonworks or Cloudera with the open source Hadoop ecosystem. After all, Google was running its own versions of MapReduce and HDFS (the Google File System) on which these open-source projects are based. Because they are integral parts of Google’s internal applications, Google has the most experience with using these technologies. And although Cloud DataFlow is specifically for use on the Google cloud and appears more like a competitor to Amazon’s Kinesis product, Google is very influential in Big Data circles, so I can see other developers following Google’s lead and leveraging a similar technology in favor of MapReduce.

Although Google’s Cloud DataFlow may have a thought leadership-type impact, Cloudera’s decision to leverage Spark as the standard processing engine for its projects (in particular, Hive) will have a greater impact on open-source Big Data developers. Cloudera has one of the most popular Hadoop distributions on the market and has partnered with Databricks, Intel, MapR, and IBM to work on their Spark integration with Hive. This trend is surprising given Cloudera’s investment in Impala (its SQL query engine), but the company clearly feels that Spark is the future. As little as a year ago, Spark was mostly seen as fast in-memory computing for machine learning algorithms. However with its promotion to an Apache Top-Level Project in February 2014 and its backing company Databricks receiving $33 million in Series B funding, Spark clearly has greater ambitions. The advent of YARN made it much easier to tie Spark to the growing Hadoop ecosystem. Cloudera’s decision to leverage Spark in Hive and other projects makes it even more important to users of the CDH distribution.

spark-stack

Read the rest of this entry » «Is MapReduce Dead?»

Big Data Revolutionizes the Gaming Industry

July 10th, 2014 by - 3,370 views

There are few technologies that promise to improve as many different industries as Big Data. Whether it’s medicine or the weather in your backyard, the mass aggregation and analyzing of information could result in marked improvement and insight on nearly anything. The cloud computing technology may even change the way we have fun. It has already had an impressive effect on the video gaming industry and will have a great deal of influence on determining what the runaway hits of tomorrow will be. Here’s are a few small but important insights into the world of the gamer.

There are few technologies out there that can stand to improve so many different industries as big data can.

There are few technologies that promise to improve as many different industries as Big Data.

Big Data observes the user learn the game
The emerging technology offers those marketing and developing video game developers more insight than ever into what makes players tick, what makes them happy, and what keeps them engaged. Any game’s success is directly connected to the “addiction” factor – what is it about a certain game that makes users feel they can’t stop playing, and even more important, how can that feeling be monetized? To study this objective further, each activity must be stripped down to individual characters, levels, and other gameplay features to determine what works and what doesn’t.

Qubole writer Gil Allouche wrote a piece on how Big Data can be used to decide how difficult individual levels should be in future incarnations of any given game. A cloud server can track how long it takes each player to finish a level, indicating whether early levels are too simple and need to be beefed up in difficulty or are discouraging new users because they’re too challenging. Mass amounts of data can help narrow down the right decision for an individual game.

Increasing sales on cloud-based consoles
For nearly all current gaming systems, the Internet and cloud hosting have integrated seamlessly to foster more sales as well as engagement between players on massively popular interactive games. By basing the gaming store online with the ability to be accessed on the console itself, gamers are saved a trip to the store and can download a new experience right to their system in real time, giving them less time to question a decision and dive right into a purchase. Big Data also allows companies to better “recommend” similar games and products to the ones a gamer is already enjoying, increasing the likelihood of sealing a sale.

Real-world examples
EA Games is one of the largest video game developers and distributors on the planet, and announced a new commitment to improving its business model and products with the help of Big Data earlier this year. This will give the company a huge technological advantage, especially when it comes to targeting advertising and maximizing player-to-player interaction in major gaming successes like Activision’s “Call of Duty” franchise and EA’s own “Battlefield” franchise. Silicon Angle, a science and technology blog, reported on the gaming company’s major statement.

Read the rest of this entry » «Big Data Revolutionizes the Gaming Industry»

3 Unusual Uses of Big Data

July 4th, 2014 by - 3,784 views

When we think of Big Data, we tend to think big picture – massive amounts of information that is used to accomplish any goal a business or individual may have and that is quickly revolutionizing how we get things done. Although this impression may be true, it doesn’t place enough focus on those who are innovating within the field. Here are three organizations that demonstrate the fascinating range of Big Data technology and its results.

Three companies that are using big data to forward their industries.

Three companies that are using Big Data to advance their industries.

Pricing outdoor marketing
Route, an outdoor media analytics company, has thrown itself fully into Big Data in an attempt to revolutionize and question the standards for pricing advertising using conventional tools like billboards, bench ads, and the sides of transportation vehicles. In previous years, owners of these spaces have charged companies “per impression,” or for every time a viewer sees the advertisement, although there has never been a way to strictly quantify these impressions. Using cloud infrastructure, Route hopes to change that result by gathering live analytics to measure how high the impression rate actually is.

E-consultancy writer Ben Davis described how the company went about the study: “360,000 frames (bits of ad space) are analyzed, both their visibility, with eye tracking studies, and the audience size and demographic that come into contact with the ads,” he explained. “28,000 people were interviewed and then tracked across the U.K. by GPS. Part of this involves traffic studies, too.” This level of precision will allow Route to use Big Data to its advantage to justify pricing.

More accurate, collaborative weather forecasting
Collecting as much information as possible to develop a product has always been the name of the game in weather forecasting, but never before has cloud server hosting technology made it possible to crowd-source this data. An application called WeatherSignal launched in 2013 for Android gives its users the opportunity to collect atmospheric data with sensors that were already installed in their devices, according to an enthusiastic article in Scientific American. Though many of the readings are gathered from phones with varying degrees of sensor capability, the application’s relatively spot-on forecasts are a great example of how the Big Data model operates using mass amounts of “dirty data” as opposed to fewer reads of the atmosphere with more advanced equipment. With users offering their devices as a free forecasting tool, it’s a difficult source of information to resist.

Optimizing personal data and lifestyles
Advocates of Big Data in the office will be happy to learn they can take it home with them by making use of a number of devices aimed at optimizing a person’s daily routine. An excellent example is the UP wristband by Jawbone, which tracks daily activity to help users build a more structured, healthy lifestyle. The wristband collects data while its wearer walks, sleeps, and eats, and then integrates with a complimentary application that synthesizes the information to provide a concise report on the actions taken throughout each day. Bernard Marr, a Big Data analytics specialist, publicized some of the fascinating features of the device in a piece on LinkedIn.

Read the rest of this entry » «3 Unusual Uses of Big Data»