KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Posts Tagged ‘Data’

 

Personal cloud computing driven by apps, data

Monday, February 25th, 2013 by

The way people acquire and use computing resources is rapidly changing. In the past, individuals would use personal computers, or PCs, for virtually every task, as the devices enabled nearly anytime access to sensitive or important digital assets. Today’s world is much different, however, especially since the dawn of the smartphone and tablet, making the computing endpoint market much more diverse. Now instead of using a PC, people have turned to cloud computing.

The personal cloud model is gaining momentum outside the workplace because of its ability to provide seamless connectivity between information and computing devices, regardless of make, model or platform type, according to a report by Gartner. As consumers continue to leverage a multitude of smartphones, tablets and other gadgets, they will demand an independent environment separate from any particular endpoint.

Personal cloud computing driven by apps, data

Personal cloud computing driven by apps, data

“Cognizant computing evolves the connected device and personal cloud service into an activity of seamless and frictionless integration connected to sensor-driven ‘invisible’ devices that are optimized for a particular set of functions,” said Michael Gartenberg, research director at Gartner. “This data and information can then be tied to other services across larger ecosystems, platforms and operating systems.”

The natural evolution of computing
Cloud infrastructure and cognizant environments represent the next natural step in computing​’s transformation into a more reliable and convenient service, Gartner noted. Rather than solely being driven by the proliferation of smartphones, tablets and other devices, computing’s future relies on the ongoing use of applications that can be accessed via multiple mediums.

Analysts said this increased use of advanced software has made solutions aware of their surroundings, providing users with more relevant information in a timely manner. This evolution is especially important as it makes its way into the business world, helping companies capture, analyze and leverage data effectively. Because software is now ubiquitous and not reliant on a particular device, individuals don’t have to make a long-term commitment to a single platform.

(more…) «Personal cloud computing driven by apps, data»

The Big Data Revolution – Part 2 – Enter the Cloud

Wednesday, March 21st, 2012 by

In Part 1 of this Big Data series, I provided a background on the origins of Big Data.

But What is Big Data?

Port Vell Barcelona

The problem with using the term “Big Data” is that it’s used in a lot of different ways. One definition is that Big Data is any data set that is too large for on-hand data management tools. According to Martin Wattenberg, a scientist at IBM, “The real yardstick … is how it [Big Data] compares with a natural human limit, like the sum total of all the words that you’ll hear in your lifetime.” Collecting that data is a solvable problem, but making sense of it, (particularly in real time), is the challenge that technology tries to solve. This new type of technology is often listed under the title of “NoSQL” and includes distributed databases that are a departure from relational databases like Oracle and MySQL. These are systems that are specifically designed to be able to parallelize compute, distribute data, and create fault tolerance on a large cluster of servers. Some examples of NoSQL projects and software are: Hadoop, Cassandra, MongoDB, Riak and Membase.

The techniques vary, but there is a definite distinction between SQL relational databases and their NoSQL brethren. Most notably, NoSQL systems share the following characteristics:

  • Do not use SQL as their primary query language
  • May not require fixed table schemas
  • May not give full ACID guarantees (Atomicity, Consistency, Isolation, Durability)
  • Scale horizontally

Because of the lack of ACID, NoSQL is used when performance and real-time results are more important than consistency. For example, if a company wants to update their website in real time based on an analysis of the behaviors of a particular user interaction with the site, they will most likely turn to NoSQL to solve this use case.

However, this does not mean that relational databases are going away. In fact, it is likely that in larger implementations, NoSQL and SQL will function together. Just as NoSQL was designed to solve a particular use case, so do relational databases solve theirs. Relational databases excel at organizing structured data and is the standard for serving up ad-hoc analytics and business intelligence reporting. In fact, Apache Hadoop even has a separate project called Sqoop that is designed to link Hadoop with structured data stores. Most likely, those who implement NoSQL will maintain their relational databases for legacy systems and for reporting off of their NosQL clusters.

(more…) «The Big Data Revolution – Part 2 – Enter the Cloud»

The Big Data Revolution – Part 1 – The Origins

Tuesday, March 20th, 2012 by

data-security

For many years, companies collected data from various sources that often found its way to relational databases like Oracle and MySQL. However, the rise of the internet and Web 2.0, and recently social media began not only an enormous increase in the amount of data created, but also in the type of data. No longer was data relegated to types that easily fit into standard data fields – it now came in the form of photos, geographic information, chats, Twitter feeds and emails. The age of Big Data is upon us.

A study by IDC titled “The Digital Universe Decade” projects a 45-fold increase in annual data by 2020. In 2010, the amount of digital information was 1.2 zettabytes. 1 zettabyte equals 1 trillion gigabytes. To put that in perspective, the equivalent of 1.2 zettabytes is a full-length episode of “24” running continuously for 125 million years, according to IDC. That’s a lot of data. More importantly, this data has to go somewhere, and this report projects that by 2020, more than 1/3 of all digital information created annually will either live in or pass through the cloud. With all this data being created, the challenge will be to collect, store, and analyze what it all means.

Business intelligence (BI) systems have always had to deal with large data sets. Typically the strategy was to pull in “atomic” -level data at the lowest level of granularity, then aggregate the information to a consumable format for end users. In fact, it was preferable to have a lot of data since you could also “drill-down” from the aggregation layer to get at the more detailed information, as needed.

Large Data Sets and Sampling

Coming from a data background, I find that dealing with large data sets is both a blessing and a curse. One product that I managed analyzed share of wireless numbers. The number of wireless subscribers in 2011 according to CTIA was 322.9 million and growing. While that doesn’t seem like a lot of data at first, if each wireless number was a unique identifier, there could be any number of activities associated with each number. Therefore the amount of information generated from each number could be extensive, especially as the key element was seeing changes over time. For example, after 2003, mobile subscribers in the United States were able to port their numbers from one carrier to another. This is of great importance to market research since a shift from one carrier to another would indicate churn and also impact the market share of carriers in that Metropolitan Statistical Area (MSA).

Given that it would take a significant amount of resources to poll every household in the United States, market researchers often employ a technique called sampling. This is a statistical technique where a panel that represents the population is used to represent the activity of the overall population that you want to measure. This is a sound scientific technique if done correctly but its not without its perils. For example, it’s often possible to get +/- 1% error at 95% confidence for a large population but what happens once you start drilling down into more specific demographics and geographies? The risk is not only having enough sample (you can’t just have one subscriber represent the activity of a large group for example) but also ensuring that it is representative (is the subscriber that you are measuring representative of the population that you want to measure?). It’s a classic problem of using panelists that sampling errors do occur. It’s fairly difficult to be completely certain that your sample is representative unless you’ve actually measured the entire population already (using it as a baseline) but if you’ve already done that, why bother sampling?

(more…) «The Big Data Revolution – Part 1 – The Origins»

How To Optimize Your Database Backups and Text File Compression with pbzip2 and pigz

Thursday, February 9th, 2012 by

Recently, GoGrid was examining performance enhancements on several internal processes; among these enhancements was switching from standard gzip to “pigz”. Since I had never heard of this “pigz”, I was intrigued by this supposed “parallel” implementation of gzip; meaning it uses all available CPU’s/cores unlike gzip. This prompted me to ask, “I wonder if there is a parallel implementation of bzip2 as well”, and there began my endeavor.

pigz and pbzip2 are multi-threaded (SMP) implementations of their respective idol file compressors. They are both actively maintained and are fully compatible with all current bzip2 and gzip archives.

If you’re like me, you might’ve stayed away from using gzip or bzip2 due to the single-threaded aspect. If I try to compress a, let’s say, 2GB file, the system becomes rather sluggish; the reason being is that the “compression tool of choice” uses almost all of 1 core of today’s multi-core, multi-CPU systems and creates an uneven load between the cores, causing the CPU to operate very inefficiently.

In this example I have a .tar file with several databases in it, which totals 1.3GB. The system in question is a GoGrid dedicated server with 8 cores. The server’s load is around 1 and is a production database server.

Using bzip2, the file took approximately 6 minutes and 30 seconds to compress. Yikes!

bzip2

(more…) «How To Optimize Your Database Backups and Text File Compression with pbzip2 and pigz»

GoGrid Says: We’ll Load Your Data Into the Cloud

Thursday, August 27th, 2009 by

You know about Cloud Computing right? And you know that GoGrid is probably one of the easiest onramps to hosting within the Cloud with our award-winning web-based portal, private server images called MyGSI, point-and-click deployments of Windows & Linux cloud servers, f5 load balancers and Cloud Storage. So, how can we further lower the barrier to entry to the Cloud? How about by providing a service that lets you ship us physical media like hard drives crammed full of data that you want in your GoGrid cloud? Let us load it for you to our Cloud Storage solution!

Cloud_Storage_banner

GoGrid Cloud Storage

First, you might be asking, what is GoGrid’s Cloud Storage anyway? It’s pretty simple actually. It’s an instantly scalable and reliable file-level backup and storage service for Windows and Linux cloud servers running in the GoGrid cloud. You basically mount GoGrid’s Cloud Storage, which uses a secure private network, using common transfer protocols like SCP, FTP, SAMBA/CIFS and RSYNC to move your data in and out of Cloud Storage. Your storage scales dynamically, on-the-fly, and you only pay for what you use.

Another nice thing, we give you an initial 10 GB of space for FREE! Each additional GB is $0.15/GB per month. More info can be found on the GoGrid product page as well as on this (older) blog post.

The New Data Transfer Service

(more…) «GoGrid Says: We’ll Load Your Data Into the Cloud»