In IT departments around the globe, CTOs, CIOs, and CEOs are asking the same question: “How can we use Big Data technologies to improve our platform operations?” Your particular role could be responsible for solving for a wide variety of use cases ranging from real-time monitoring and alerting to platform operations analysis or behavioral targeting and marketing operations. The solutions for each of these use cases vary widely as well. But no matter which Big Data solution you choose, make sure you avoid the following 3 pitfalls.
Pitfall #1: Assuming a single solution fits all use cases
In a recent post, Liam Eagle of 451 Research looked at GoGrid’s Big Data product set, which is purpose-built for handling different types of workloads. He noted that variety is the key here. There isn’t a single one-size-fits-all solution for all your use cases. At GoGrid, for example, many of our Big Data customers are using 3 to 5 solutions, depending on their use case, and their platform infrastructure typically spans a mix of cloud and dedicated servers running on a single VLAN. So when you’re evaluating solutions, it makes sense to try out a few, run some tests, and ensure you have the right solution for your particular workload. It’s easy for an executive to tell you, “I want to use Hadoop,” but it’s your job that’s on the line if Hadoop doesn’t meet your specific needs.
As I’m sure you already know, Big Data isn’t just about Hadoop. For starters, let’s talk about NoSQL solutions. The following table lays out a few options and their associated use cases to help illustrate the point.
|Solution||Common Use Cases||Pros and Cons|
And these are just a few examples. Managing Big Data goes way beyond NoSQL and Hadoop. In fact, it probably includes ingestion technology like Flume, NoSQL, MapReduce, RBDMS, caching and associated hooks into various applications, depending on how you want to surface information. This leads us to the second pitfall you’ll want to avoid.
Pitfall #2: Blowing the IT budget by over-planning for scale
Big Data is after all “big,” right? The answer is, it depends. When you’re developing your platform, Big Data is actually not so big. And during the initial launch, you probably don’t actually need as much capacity as you might think you do. Herein lies the conundrum. If you over-plan and blow your budget, you could lose your job. If you under-plan and the application goes down, you could lose your job. So, what’s a smart person to do? The answer is pretty simple: Use the cloud or at least a mix of cloud and dedicated infrastructure so you can meet spikes in demand and avoiding over-provisioning.
At GoGrid, our most successful customers use a mix of dedicated and cloud infrastructure for various reasons: some want to develop in the cloud and move to dedicated infrastructure to accommodate specific configuration requirements, and some want to use our SSD Cloud Servers and Block Storage-optimized network fabric while keeping parts of their operation on single-tenant dedicated servers. By using a flexible mix of infrastructure, our customers are able to meet stringent HIPAA and PCI compliance standards and take advantage of the elastic nature of cloud infrastructure to ensure they always have the appropriate capacity to meet their needs. If you do the same, that unplanned spike of data coming into the system suddenly becomes no problem at all. And if your development cycles take longer than you expected, you haven’t blown the budget because you didn’t purchase thousands of servers’ worth of capacity up-front. By selecting the right partners, you’re able to create the flexible environment your business needs, which leads me to the third and final pitfall you’ll want to avoid.
Pitfall #3: Selecting a proprietary service provider rather than an Open Data Services provider
Evaluation of Big Data solutions typically goes something like this. Line up 3 to 5 solutions that solve for a particular use case, come up with evaluation and testing criteria, run a proof of concept (POC) or two or three, and then select the highest-performing solution. One piece of the evaluation criteria that’s often overlooked, however, is the potential lock-in and use of proprietary technology that only runs one platform. Here are the top 3 disadvantages of lock-in cited by our customer base:
- The inability to scale across multiple service providers can bust your budget.
Translation: You could lose your job because your service provider is too expensive and you’re locked into their platform.
- A lack of developers who understand and can support the solution makes hiring hard.
Translation: The one developer you hired to support the solution left the company, so now you can’t manage the solution, the platform fails, and you lose your job.
- Inconsistent architecture across service providers and on-prem leads to operational overhead.
Translation: You might not lose your job for creating additional operational overhead, but you’d probably like to keep more of your budget if you can.
On the flip side, selecting an Open Data Services (ODS) provider means you gain the following 3 benefits:
- You can easily scale across multiple service providers because the technology the ODS provider supports is open and can be translated to other providers. Being able to scale like this also makes highly available operations like disaster recovery and failover much easier because you don’t have to re-invent the wheel when performing them.
- Your hiring challenge becomes a thing of the past because when technology is supported by thriving communities, there are lots of people out there you can target and hire as you need them.
- You enjoy greater operational efficiency, which means you can spend less time managing multiple platforms and more time focusing on the projects that will help you be more competitive and ultimately drive more revenue.
When you finally do decide you’re ready to evaluate and/or run your Big Data solution, just remember that GoGrid makes it easy. We offer the largest set of open data solutions and purpose-built infrastructure on the market today. But don’t just take my word for it: check out our solutions for yourself. And if you need help performing tests, setting up your evaluation criteria, or selecting a solution, feel free to chat with one of our team members. We’re happy to help.
For even more information on Big Data, you can also read our new “essentials” white paper.
Latest posts by Kole Hicks (see all)
- Comparing Cloud Infrastructure Options for Running NoSQL Workloads - April 11, 2014
- How to Deploy a Riak Cluster in 5 Minutes on GoGrid - January 31, 2014
- Implementing Big Data in the Cloud: 3 Pitfalls that Could Cost You Your Job - November 25, 2013