<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Big Data Made Simple - One source. Many perspectives. &#187; SQL</title>
	<atom:link href="http://bigdata-madesimple.com/category/tech-and-tools/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://bigdata-madesimple.com</link>
	<description>One source. Many perspectives.</description>
	<lastBuildDate>Sat, 08 Jul 2017 05:11:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.2</generator>
		<item>
		<title>9 useful resources for those who wants to know more about SQL</title>
		<link>http://bigdata-madesimple.com/9-useful-resources-for-those-who-wants-to-know-more-about-sql/</link>
		<comments>http://bigdata-madesimple.com/9-useful-resources-for-those-who-wants-to-know-more-about-sql/#comments</comments>
		<pubDate>Fri, 30 Jun 2017 05:30:21 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://bigdata-madesimple.com/?p=21639</guid>
		<description><![CDATA[<p>SQL, Structured Query Language, is the primary language responsible for management of data and data structures within a...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/9-useful-resources-for-those-who-wants-to-know-more-about-sql/">9 useful resources for those who wants to know more about SQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>SQL, Structured Query Language, is the primary language responsible for management of data and data structures within a relational database management system. In other words, SQL is a language used to communicate with a database. It is important to mention it is one of the most sought after skills among hiring employers. Learning SQL opens doors to career success and it will look great on your resume. Here are some useful resources you can use to make the learning process easier.</p>
<p>1. <a href="https://www.w3schools.com/sql/default.asp">W3Schools – SQL Tutorial</a></p>
<p>W3Schools is one of the largest web developer sites that you can find on the internet. The website provides a multitude of tutorials you can use to develop your skills and SQL is one of them. On this website, you can learn how to use SQL in SQL Server, MySQL, Oracle, and other systems. What’s practical about this site is the quiz feature where you can test your progress, identify strengths and weaknesses, and improve the learning experience.</p>
<p>2. <a href="https://sqlbolt.com/"><b>SQLBolt</b></a></p>
<p>SQLBolt is, essentially, a series of interactive lessons and exercises that are created to help users learn SQL easily. The lessons and topics found on this site are comprehensive and they cover all the important details of using SQL. This resource is particularly useful for beginners with 19 easy but important lectures that you should know before you move on to more complex details. They also started adding intermediate lessons, at this point only 3 are available but we can expect more soon.</p>
<p>3. <a href="https://academy.vertabelo.com/"><b>Vertabelo Academy</b></a></p>
<p>Vertabelo Academy provides interactive SQL courses right in your browser. Each course features extensive practice material that you can use to enhance your skills and build confidence. The website offers three types of courses: free, paid, beta (works in progress that you can use for free to practice). Vertabelo Academy teaches you about SQL queries, table creation, and data management.</p>
<p>Using this website is easy; you start each item with instructions and examples and when you’re ready, it’s time to move on to exercises. Here, you also have the opportunity to discuss the course with other users and trade experiences.</p>
<p>4. <a href="https://www.codecademy.com/learn/learn-sql"><b>Codeacademy</b></a></p>
<p>Codeacademy is an online platform that provides various free coding courses in programming languages. The site is dedicated to providing an optimal learning experience and you can use it to learn how to manage data with SQL. Codeacademy’s LearnSQL is free and interactive. The platform covers the basics of database essentials including queries, tables, aggregate functions, developing advanced database queries, among other things.</p>
<p>Each lesson is divided into three panels containing a description of the exercise, an interactive SQL command line, and a visual representation of the database schema with the result of the query. Check your knowledge with a quiz and see how far you’ve come. To take the course, you have to register using your email address and Facebook or Google account.</p>
<p>5. <a href="https://www.udemy.com/courses/it-and-software/other/sql-courses/"><b>Udemy</b></a></p>
<p>Udemy is a great online resource with a mission to “help anyone learn anything”. SQL courses on this site are paid, but frequent promotions bring prices down and you can find an ideal course regardless of your budget. What’s beneficial about courses at Udemy is that you can opt for the one that perfectly matches your current skills and SQL knowledge.</p>
<p>6. <a href="https://www.khanacademy.org/computing/computer-programming/sql"><b>Khan Academy “Intro to SQL”</b></a></p>
<p>Khan Academy offers personalized learning dashboard, a lot of practice exercises, and micro-lectures in the form of YouTube videos. This allows you to study at your own pace and develop SQL skills gradually. Unlike many other resources, you can adapt this one to your needs and preferences.</p>
<p>The entire course contains 5 parts starting with basics and leading you all the way up to more advanced lessons. You don’t have to register in order to watch videos, but if you have some questions or want to take part in discussions, then you will have to sign in.</p>
<p>7. <a href="http://sqlzoo.net/"><b>SQLZoo</b></a></p>
<p>SQLZoo is ideal for people who prefer extensive personal support and a more thorough approach to lessons. Of course, lessons are interactive and the site is free to use. Here, SQL course comes with live interpreters and interactive exercises for different types of databases. All tutorials come in step-by-step format and you also have the option to use live chat, test your knowledge with a quiz, and the content is available without registration.</p>
<p>8. <a href="https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about"><b>Stanford University</b></a></p>
<p>Yes, THAT Stanford University provides an online self-paced course with video tutorials that you can use to learn basic SQL skills. To get started, all you have to do is to select the course and there are plenty of options including querying databases, SQL advanced features such as indexes and transactions, constraints and triggers, online analytical processing, recursion in SQL, and many others.</p>
<p>9. <a href="http://www.sql-tutorial.ru/"><b>SQL Problems and Solutions</b></a></p>
<p>This unique platform acts as an interactive textbook which allows you to visualize tables and execute queries using a sample database. This tutorial explains the fundamental concepts and constructs of SQL. At the same time, it displays examples for different levels of expertise in order to help you learn better and move on to other lessons. Once you’ve learned all the lessons, put your skills to a test using a sister site <a href="http://www.sql-ex.ru/"><b>SQL Exercises</b></a>.</p>
<p><b>Bottom line</b></p>
<p>If you have ever wanted to learn SQL but didn’t know how then resources from this post will help you out. They cover the basics, move to more advanced lessons and courses, and allow you to test your skills using quizzes and sample databases. These sites only prove that you don’t need to spend a fortune to learn more about SQL, you can do it for free.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/9-useful-resources-for-those-who-wants-to-know-more-about-sql/">9 useful resources for those who wants to know more about SQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/9-useful-resources-for-those-who-wants-to-know-more-about-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>5 common SQL query design mistakes to avoid at all costs</title>
		<link>http://bigdata-madesimple.com/5-common-sql-query-design-mistakes-to-avoid-at-all-costs/</link>
		<comments>http://bigdata-madesimple.com/5-common-sql-query-design-mistakes-to-avoid-at-all-costs/#comments</comments>
		<pubDate>Mon, 27 Feb 2017 11:12:19 +0000</pubDate>
		<dc:creator>Ahamed Meeran</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://54.179.177.208/?p=20882</guid>
		<description><![CDATA[<p>To run SQL Server databases successfully, you must be keen on query design. Unfortunately, most people do not...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/5-common-sql-query-design-mistakes-to-avoid-at-all-costs/">5 common SQL query design mistakes to avoid at all costs</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>To run SQL Server databases successfully, you must be keen on query design. Unfortunately, most people do not give the design process a second thought. As a result, they make simple mistakes that, though easy to avoid, have far-reaching consequences.</p>
<p>For starters, with poorly written queries, you cannot guarantee users lightning-fast retrieval times. Your servers will also be plagued by problems from day one. And in today&#8217;s digital world, these are mistakes you cannot afford to make. But, how do you avoid making these mistakes? Here are tips on how to go about it.</p>
<p><strong>1. Failing to Review Your Data Model</strong></p>
<p>Your data model determines how users access data. So, think your model through right from the beginning. If you do not, you will have to deal with unwieldy queries and complicated code down the line, and both impact negatively on performance. An easy way to figure out which queries are needed to access data is to print out your data model.</p>
<p>Or, better still, have a data modelling tool do it for you. A print-out or modelling tool lets you see what you are up against. You are, therefore, in a better position to simplify code, increase coding time, increase accuracy, and improve performance.</p>
<p><strong>2. Failing to Consider Your Technique</strong></p>
<p>What technique do you use? Is it cursor logic, or set-based logic? There is no easy answer to this particular question: it all depends on the performance that best suits your needs. Take set-based logic, for instance. It is the obvious choice for database access. After all, an SQL Server is designed for it. But, cursor logic can in some instances outperform based logic. The key is not to use one technique when the other would be better.</p>
<p><strong>3. Not Using Old Coding Techniques</strong></p>
<p>When you use tried-and-tested coding techniques, you seldom land in trouble. Even coding methods you learned from SQL Server 2005 can prove useful today. Try to use the TRY&#8230;CATCH error handling technique in your coding. The results may surprise you. Using <a href="https://facility9.com/2008/12/a-quick-introduction-to-common-table-expressions-3/">Common Table Expressions</a> for hierarchies, or the Common Language Runtime (CLR) <a href="https://blog.jooq.org/2013/10/03/the-10-most-popular-db-engines-sql-and-nosql/">database engine</a> may also leave you surprised.</p>
<p>If you need help brushing up on old techniques, do some revision and look for some articles on line. There are plenty out there. <a href="http://www.acuitytraining.co.uk/server-database-programming/sql-training/using-union-queries-in-sql-and-access">Here</a> and <a href="http://www.sql-join.com/">here</a> are a couple of SQL examples.</p>
<p><strong>4. Not Taking Advantage of Peer Review</strong></p>
<p>Before deploying your query plans, you should have someone else review it. Chances are that other people will see what you have missed. Their reviews on your indexes and query performance often help you to further improve your code. They could also learn a thing or two from you in the process, and vice-a-versa.</p>
<p><strong>5. Failing to Test Your Queries</strong></p>
<p>Developers hate having to test code. First, it&#8217;s rigorous. And second, the testing environment (hardware and data) rarely match the real production environment. But testing is a necessary, and unavoidable, part of coding. So, thoroughly test your code, and where possible, try to mimic the final production environment as closely as possible. Remember, your queries might perform well with a few hundred records, but not against millions in the final environment.</p>
<p><strong>Conclusion</strong></p>
<p>Queries determine the speed and performance of an SQL database. So, try to avoid common mistakes such as not reviewing your data model, or failing to consider which technique to use. Others are failing to use old coding techniques, not taking advantage of peer review mechanisms, and failing to test your queries.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/5-common-sql-query-design-mistakes-to-avoid-at-all-costs/">5 common SQL query design mistakes to avoid at all costs</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/5-common-sql-query-design-mistakes-to-avoid-at-all-costs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNA vs modern backup methods: The future of data storage</title>
		<link>http://bigdata-madesimple.com/dna-vs-modern-backup-methods-the-future-of-data-storage-2/</link>
		<comments>http://bigdata-madesimple.com/dna-vs-modern-backup-methods-the-future-of-data-storage-2/#comments</comments>
		<pubDate>Mon, 12 Dec 2016 11:09:55 +0000</pubDate>
		<dc:creator>Ahamed Meeran</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://54.179.177.208/?p=20686</guid>
		<description><![CDATA[<p>It’s difficult to wrap one’s mind around this, but it’s now possible to store vast amounts of data...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/dna-vs-modern-backup-methods-the-future-of-data-storage-2/">DNA vs modern backup methods: The future of data storage</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>It’s difficult to wrap one’s mind around this, but it’s now possible to store vast amounts of data on a DNA strand. Isn’t it ironic that so much of the world’s data is stored via computerized data centers that are the result of many years of information technology and development, yet bacterial DNA has been with us for years and has been capable of doing the job better than anything else, all along?</p>
<p><a href="http://www.nature.com/news/how-dna-could-store-all-the-world-s-data-1.20496">Nature</a> argues that researchers must ensure information is reliably encoded and that nucleotide strings can be produced economically and efficiently, as well.  We’ll eventually have no choice, though, since—<a href="https://www.backupassist.com/blog/news/dna-data-storage-future/">according to BackupAssist</a>—“the digital data we need to store is increasing at an unsustainable rate for our current hardware.”  This seems like all the more reason to streamline data use and make sure no more is stored than is absolutely necessary.  It does seem, though, that we’re heading toward a kind of science fiction version of reality, in a way.</p>
<p><a href="https://www.scientificamerican.com/article/tech-turns-to-biology-as-data-storage-needs-explode/">Scientific American quotes Microsoft’s Karin Strauss</a> envisioning what this DNA storage and retrieval might look like via “archival DNA storage services,” within the next decade: “You could open your browser and upload files to their site or get your bytes back, like cloud storage,” she says.  “Or, with as yet unrealized breakthroughs in DNA synthesis and sequencing, ‘you could buy a DNA drive instead of a disk drive.’”</p>
<p>So there’s the future, and there’s what we have now.  There’s network-attached storage and online or <a href="http://www.mycollegelaptop.com/tech-updates/free-and-cheap-ways-back-your-computer-files-online/">cloud-based storage</a> like DropBox, Google Drive, and iCloud.  For more small-scale data storage, there are flash memory drives and external hard drives.  However, businesses considering cloud-based storage are often concerned with data security; <a href="http://storage.cioreview.com/cxoinsight/the-future-of-data-backup-in-the-cloud-and-free-nid-12252-cid-12.html">CIO Review</a> identifies SkyHigh Networks and Netskope as two cloud-security companies that are filling the gap.  This concern with security will continue to be a prominent factor as cloud-based storage and backup become the norm—as opposed to on-site, enterprise software.</p>
<p>The problem with many of these cloud-based services is also what makes them convenient: that is, because they are accessible from anywhere, they are also more prone to hacking when being accessed from a location other than the workplace: say a public Internet café or a personal home computer utilized by multiple users.  Another possible point of weakness is the prevalence of <a href="http://businessdegrees.uab.edu/resources/infographics/promoting-data-security-in-the-workplace/">BYOD in the workplace</a>, as of late.  If an employee signs into network drives from a personal computer or smartphone while at work, then transmits unsecured data using their own devices, that data will be more vulnerable to hackers and other people outside the network.   Password-protected documents and devices will help ensure security of information, regardless of the device being used.</p>
<p><a href="https://msdn.microsoft.com/en-us/library/bb727010.aspx">Microsoft provides a guide</a> to different backup techniques, as well as a bit of information as to different types of backup hardware required to perform backups—information useful to the layman, for practical reference.  In addition to full backups, there are differential backups, incremental backups, and daily backups.  Microsoft recommends doing a combination of full and partial backups on a weekly basis.</p>
<p>For now, the most trending form of data storage and backup come in the form of cloud-based services—though the transition is a slow one.  <a href="http://www.computerweekly.com/blog/StorageBuzz/VMware-Decades-of-hybrid-cloud-ahead">Antony Adshead predicts</a> that “The tipping point at which public cloud operations attain a 50% share of IT workloads will come in 2030.  Until then, and beyond, we face ‘decades of a hybrid [cloud] world.’”  <a href="http://cloudstorageadvice.com/what-is-cloud-storage/">The hybrid cloud</a> combines a few features of private and public clouds, while also allowing for advanced customization; you can pick and choose which bits of information are stored on the private versus the public side, depending on the level of sensitivity.</p>
<p align="center"><b>*   *   *</b></p>
<p>As we move into the future, cloud-based computing is likely to become more and more prominent, allowing us to access the majority of our information for work and personal use from anywhere.  The need for effective cybersecurity will also increase, but cloud-based data platform companies are doing a good job of anticipating these concerns by <a href="http://www.investors.com/research/industry-snapshot/cisco-palo-alto-symantec-gear-up-for-cloud-cyber-security/">acquiring cybersecurity companies</a>—known in the ‘cloud-world’ as cloud access security brokers (CASB)—at an increased rate.  Expect to see more merging of data storage platforms with security-based companies as the move to cloud becomes increasingly widespread.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/dna-vs-modern-backup-methods-the-future-of-data-storage-2/">DNA vs modern backup methods: The future of data storage</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/dna-vs-modern-backup-methods-the-future-of-data-storage-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Database corner: Beginner&#8217;s guide to Mysql storage engines</title>
		<link>http://bigdata-madesimple.com/database-corner-begineers-guide-to-mysql-storage-engines/</link>
		<comments>http://bigdata-madesimple.com/database-corner-begineers-guide-to-mysql-storage-engines/#comments</comments>
		<pubDate>Tue, 01 Nov 2016 16:24:46 +0000</pubDate>
		<dc:creator>Manu Jeevan</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://bigdata-madesimple.com/?p=20087</guid>
		<description><![CDATA[<p>When a database is created, one often overlooked but critical factor in performance is the storage engine (particularly...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/database-corner-begineers-guide-to-mysql-storage-engines/">Database corner: Beginner&#8217;s guide to Mysql storage engines</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>When a database is created, one often overlooked but critical factor in performance is the storage engine (particularly as the database grows). In many instances, the temptation is to just accept the default and continue on developing your project. This can lead to unexpected negative impacts on performance, backups, and data integrity later in the application life cycle, such as when your team implements analytics and <a href="https://www.sisense.com/connectors/mysql-reporting/">MySQL dashboards.</a></p>
<p>To avoid these potential pitfalls, we are going to take a closer look at some of the most widely used storage engines supported by MySQL (as of version 5.7).</p>
<p><strong>Supported Storage Engines</strong></p>
<p><strong>What are my options?</strong></p>
<p>By default, MySQL 5.7 supports ten storage engines (InnoDB, MyISAM, Memory, CSV, Archive, Blackhole, NDB, Merge, Federated, and Example). To see which ones are available and supported by your server, use this command:</p>
<p><em>mysql&gt; SHOW ENGINES\G</em></p>
<p>This will output a list of storage engines and tell you which are available, not available, or which is currently set to the default. The “Support:” column will display ‘YES’, ‘NO’, or ‘DEFAULT’, respectively.</p>
<p>In some applications, the need may arise to have different storage engines for different tables in the same database. This is an example of why you need to carefully <a href="https://www.sisense.com/blog/7-data-modeling-mistakes-that-can-sink-your-analysis/">plan the data model</a> for your application. In most cases, however, only one storage engine will be needed.</p>
<p><strong>Storage Engine Capabilities</strong></p>
<p><strong>What are they good at?</strong></p>
<p>Let’s take a closer look at some of the most commonly used storage engines. This will give us an idea of what each engine was designed to do and how they can best be used to serve our business goals.</p>
<p><strong>InnoDB:</strong> The default option in MySQL 5.7, InnoDB is a robust storage engine that offers:</p>
<ul>
<li>Full ACID compliance</li>
<li>Commit, rollback, and crash-recovery</li>
<li>Row-level locking</li>
<li>FOREIGN KEY referential-integrity constraints</li>
<li>Increase multi-user concurrency (via non-locking reads)</li>
</ul>
<p>With the above functionality that InnoDB offers, it is obvious why it is the default engine in MySQL. It is an engine that performs well and offers many of the required attributes that any database would need. However, a comprehensive discussion of all of its capabilities is outside the scope of this article. This is the engine that will most likely be used in the majority of applications.</p>
<p><strong>MyISAM:</strong> The functionality that sets MyISAM apart is its capability for:</p>
<ul>
<li>full text search indexes</li>
<li>table-level locking</li>
<li>lack of support for transactions</li>
</ul>
<p>Though it is a fast storage engine, it is best suited for use in read-heavy and mostly read applications such as data warehousing and web applications that don’t need transaction support or ACID compliance.</p>
<p><strong>NDB</strong> (or NDBCLUSTER): If a clustered environment is where your database will be working, NDB is the storage engine of choice. It is best when you need:</p>
<ul>
<li>Distributed computing</li>
<li>High-redundancy</li>
<li>High-availability</li>
<li>The highest possible uptimes</li>
</ul>
<p>Take note that support for NDB is not included in the distribution of standard MySQL Server 5.7 binaries. You will have to update to latest binary release of MySQL Cluster. Though, if you’re developing in a cluster environment, you probably have the necessary experience to deal with these tasks.</p>
<p><strong>CSV:</strong> A useful storage engine when data needs to be shared with other applications that use CSV formatted data. The tables are stored as comma separated value text files. Though this makes sharing the data with scripts and applications easier, one drawback is that the CSV files are not indexed. So, the data should be stored in an InnoDB table until the Import/Export stage of the process.</p>
<p><strong>Blackhole:</strong> This engine accepts but does not store data. Similar to the UNIX /dev/null, queries always return an empty set. This can be useful in a distributed database environment where you do not want to store data locally or in performance or other testing situations.</p>
<p><strong>Archive:</strong> Just as the name implies, this engine is excellent for seldom-referenced historical data. The tables are not indexed and compression happens upon insert. Transactions are not supported. Use this storage engine for archiving and retrieving past data.</p>
<p><strong>Federated:</strong> This storage engine is for creating a single, local, logical database by linking several different physical MySQL servers. No data is stored on the local server and queries are automatically executed on the respective remote server. It is perfect for distributed data mart environments and can vastly improve performance when <a href="https://www.sisense.com/blog/pros-cons-using-mysql-analytical-reporting/">using MySQL for analytical reporting</a>.</p>
<p><strong>Designating a storage engine</strong></p>
<p><strong>How do I change which storage engine is used?</strong></p>
<p>The storage engine that is used is established upon table creation. As previously stated, InnoDB is the default storage engine in MySQL versions 5.5 and higher. If you would like to use a different one, it is best to do this within your CREATE TABLE statement. For instance, let’s say that you have identified a table that needs use the CSV storage engine. Your overly simplified CREATE TABLE statement might look like this:</p>
<p><em>mysql&gt; CREATE TABLE Shared_Data (</em></p>
<p><em>    -&gt; Data_ID INTEGER NOT NULL,</em></p>
<p><em>    -&gt; Name VARCHAR(50) NOT NULL,</em></p>
<p><em>    -&gt; Description VARCHAR(150)</em></p>
<p><em>    -&gt; ) ENGINE=&#8217;CSV’;</em></p>
<p>After which we would perform an INSERT statement as usual:</p>
<p><em>mysql&gt; INSERT INTO Shared_Data VALUES</em></p>
<p>-&gt; (1,’device one’, ‘the latest version of the best tech’),</p>
<p>-&gt; (2,’device two’, ‘the fastest one on the market’);</p>
<p>Upon success, if you inspect the database directory, there should now be a ‘Shared_Data.CSV’ file in it that contains the records you have inserted into the Shared_Data table.</p>
<p>The same methodology can be used for any one of the many storage engines that MySQL supports. Though it is possible to change the storage engine after a table has been created with an <code>ALTER TABLE</code> statement, it is best practice to plan accordingly and set it in the beginning.</p>
<p><strong>In Closing</strong></p>
<p><strong>MySQL has many options</strong></p>
<p>As you can see, MySQL offers support for storage engines designed to handle very different tasks in many different environments. Identifying which engines to use and when to use them can help us avoid unnecessary complications and performance issues as our applications scale.</p>
<p>Whether you need 99.999% uptime and reliability on your distributed computing cluster or you need ACID compliant transaction support with FOREIGN KEY constraints, MySQL has a storage engine to suit your needs.</p>
<p>As always, proper planning and identification of your project goals and requirements is the best way to accurately identify which storage engines are best suited for your application. Hopefully, this article serves as a useful starting point for helping you in that respect.</p>
<p>Originally appeared on <a href="https://www.sisense.com/blog/beginners-guide-to-mysql-storage-engines/" target="_blank">Sisense blog</a>.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/database-corner-begineers-guide-to-mysql-storage-engines/">Database corner: Beginner&#8217;s guide to Mysql storage engines</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/database-corner-begineers-guide-to-mysql-storage-engines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Relational vs Non-Relational data bases &#8211; Part 3</title>
		<link>http://bigdata-madesimple.com/relational-vs-non-relational-data-bases-part-3/</link>
		<comments>http://bigdata-madesimple.com/relational-vs-non-relational-data-bases-part-3/#comments</comments>
		<pubDate>Wed, 16 Sep 2015 07:06:25 +0000</pubDate>
		<dc:creator>Manu Jeevan</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://bigdata-madesimple.com/?p=15493</guid>
		<description><![CDATA[<p>In the first and second part of this blog series, we saw some basic differences between scalability of Relational and Non-Relational Databases....</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/relational-vs-non-relational-data-bases-part-3/">Relational vs Non-Relational data bases &#8211; Part 3</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-1/" target="_blank">first</a> and <a href="http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/" target="_blank">second</a> part of this blog series, we saw some basic differences between scalability of Relational and Non-Relational Databases. In this post, I will show you how to use these databases correctly, and also tell you about some well known companies that use these databases.</p>
<p>Relational Database</p>
<p>In the first part of this blog series, I talked about ACID properties. These properties are important to maintain a tight transactional integration. There are some industries like banking, retail, etc., where each transaction requires ACID properties. In bank transactions, if one account is credited, another should be debited. The partial update is never allowed, as it will affect the data integrity — Oracle, SQL Server, MySQL and other RDBMS are used in this scenario.</p>
<p>Non-Relational Database (NoSQL DB)</p>
<p>In the first part of this blog series, I also talked about BASE properties. These are important to keep data consistent across all nodes in a database. Any information that does not require strict data integrity can be stored in an NoSQL DB. For instance, contents of a <a href="http://www.webopedia.com/TERM/S/search_engine.html" target="_blank">Search Engine System</a> can be stored in a non-relational database, because it is easy to retrieve information quickly. A good example of a search engine system is Google. Google usually stores its cached web pages in a web layer that is refreshed periodically. These databases can store terabytes of historical data (say credit card transactions of a bank, for the past 5 years) in a distributed environment. It is easy to do analyse and mine data in an NoSQL DB using SQL-Like HIVE data warehouse software. NoSQL DBs can be used to store massive volumes of unstructured data and are suitable for text analysis too.</p>
<p>I Have Listed Some Top Organisations Who Use These Databases:</p>
<p>Relational Databases</p>
<p><a href="http://www.microsoft.com/en-in/server-cloud/products/sql-server/" target="_blank">SQL Server</a>: LG Electronics, MySpace, Hilton Hotels.</p>
<p><a href="http://www.dbms2.com/2008/09/24/some-of-oracles-largest-data-warehouses/" target="_blank">ORACLE</a>: British Telecom, MasterCard, Reliance Ltd.</p>
<p><a href="http://www.mysql.com/customers/" target="_blank">MySQL</a>: Facebook, <a href="http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html" target="_blank">Twitter</a>, LinkedIn. <a href="https://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/" target="_blank">Facebook uses MySQL</a> to store user interaction like status updates, shares, likes, etc.</p>
<p>Non-Relational Databases</p>
<p><a href="http://www.couchbase.com/customer-stories" target="_blank">CouchBase</a>: LinkedIn, AdAction.</p>
<p><a href="http://wiki.apache.org/cassandra/Cassandra" target="_blank">Cassandra</a>: Facebook, Twitter, Digg.</p>
<p><a href="http://www.mongodb.org/about/production-deployments/" target="_blank">MongoDB</a>: LinkedIn, Pearson.</p>
<p><a href="http://www.neotechnology.com/customers/" target="_blank">Neo4j</a>: Cisco, eBay, etc.</p>
<p>As you have seen, companies like Facebook, Twitter and LinkedIn use both Relational and Non-Relational Databases, based on their requirements.</p>
<p>Now let me return to the first part of this series, and answer the following questions:</p>
<p>Are relational databases capable of handling big data?</p>
<p>Are relational databases scalable?</p>
<p>Are relational databases suited for the modern age data requirements? Such as Real-time analytics, dealing with unstructured data?</p>
<p>The answer to all these questions is a vehement “YES”. Relational databases are not going away in this social world. Based on the nature and complexity of the data set, the right database should be used. Both Relational and Non-Relational Databases have their own advantages and disadvantages. The correct environment set up can make use of relational and non-relational data bases in a proper way like how Facebook, Twitter and LinkedIn have done it.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/relational-vs-non-relational-data-bases-part-3/">Relational vs Non-Relational data bases &#8211; Part 3</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/relational-vs-non-relational-data-bases-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>10 interesting facts and tips about MySQL</title>
		<link>http://bigdata-madesimple.com/10-interesting-facts-and-tips-about-mysql/</link>
		<comments>http://bigdata-madesimple.com/10-interesting-facts-and-tips-about-mysql/#comments</comments>
		<pubDate>Thu, 22 Jan 2015 07:02:10 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=12401</guid>
		<description><![CDATA[<p>MySQL is the fastest growing open-source relational database management system with 100 million downloads till date. It is...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/10-interesting-facts-and-tips-about-mysql/">10 interesting facts and tips about MySQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>MySQL is the fastest growing open-source relational database management system with 100 million downloads till date. It is a popular choice of database for use in web applications and is currently used by many large websites, including Facebook, Twitter, Wikipedia, Flickr, YouTube etc. Now, let&#8217;s find out top 10 interesting facts and tips about MySQL.</p>
<p>1. MySQL supports up to 64 indexes per table. Each index may consist of 1 to 16 columns. The maximum index size is 1000 bytes (767 for InnoDB).</p>
<p>2. The maximum size of a row in a MySQL table is 65,535 bytes. And the maximum value of Signed Integer is 2,147,483,647 and the value of Unsigned Integer is 4,294,967,295. In a mixed table with both CHAR and VARCHAR, MySQL will change the CHAR&#8217;s to VARCHAR&#8217;s.</p>
<p>3. If a PRIMARY KEY or UNIQUE index consists of only one column that has an integer type, you can also refer to the column as &#8220;_rowid&#8221; in SELECT statements.</p>
<p>4. To change the value of the AUTO_INCREMENT, use &#8220;ALTER TABLE &lt;Tablename&gt; AUTO_INCREMENT = value;&#8221; or &#8220;SET INSERT_ID = value;&#8221;</p>
<p>5. To restrict MySQL from being accessed publicly, use &#8220;skip-networking&#8221; option in the config file. When it is enabled, MySQL only listens to local socket connections and ignores all TCP ports. And &#8220;bind-address&#8221; parameter which is set to &#8220;127.0.0.1&#8243; restricts the MySQL to be accessible only by the localhost.</p>
<p>6. If the MySQL has many connects established (i.e. a website without persistent connections), you can improve the performance by setting thread_cache_size to a non-zero value. 16 is a good value to start with. Increase the value until your threads_created do not grow very quickly.</p>
<p>7. NO_AUTO_VALUE_ON_ZERO suppresses auto increment for 0. Only NULL generates the next sequence number. This mode can be useful if 0 has been stored in a table&#8217;s AUTO_INCREMENT column. (Storing 0 is not a recommended practice, by the way.)</p>
<p>8. The configuration options &#8220;innodb_analyze_is_persistent&#8221;, &#8220;innodb_stats_persistent_sample_pages&#8221; and &#8220;innodb_stats_transient_sample_pages&#8221; provide improved accuracy of InnoDB index statistics, and consistency across MySQL restarts. InnoDB precomputes statistics that help the optimizer decide which indexes to use in a query, by sampling a portion of the index. You can adjust the amount of sampling that InnoDB does for each index. The resulting statistics can now persist across server restarts, rather than being recomputed (and possibly changing) due to restarts and some runtime events. The more accurate statistics can improve query performance, and the persistence aspect can keep query performance stable. When the persistent stats feature is enabled, the statistics are only recomputed when you explicitly run ANALYZE TABLE for the table.</p>
<p>9. InnoDB frees up the memory associated with an opened table to ease the memory load on systems with huge numbers of tables. An LRU algorithm selects tables that have gone the longest without being accessed. To reserve more memory for open tables, increase the value of the &#8211;table_definition_cache=# configuration option.</p>
<p>10. Set table_cache parameter to match the number of open tables and concurrent connections. Watch the open_tables value and if it is growing quickly you need to increase the size of &#8220;table_cache&#8221;. And for the parameter &#8220;open_file_limit&#8221; set this limit as 20+max_connections+table_cache*2. If you have complex queries &#8220;sort_buffer_size&#8221; and &#8220;tmp_table_size&#8221; are likely to be very important. Values will depend on the query complexity and available resources, but 4Mb and 32Mb, respectively are recommended starting points.</p>
<p>Note: These are &#8220;per connection&#8221; values. So, consider your load and available resource when setting these parameters. For example sort_buffer_size is allocated only if MySQL needs to do a sort, be careful not to run out of memory.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/10-interesting-facts-and-tips-about-mysql/">10 interesting facts and tips about MySQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/10-interesting-facts-and-tips-about-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Top 10 best practices in MySQL</title>
		<link>http://bigdata-madesimple.com/top-10-best-practices-in-mysql/</link>
		<comments>http://bigdata-madesimple.com/top-10-best-practices-in-mysql/#comments</comments>
		<pubDate>Mon, 19 Jan 2015 10:07:11 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=12367</guid>
		<description><![CDATA[<p>MySQL is the second most widely used open-source relational database management system in the world. It has become...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/top-10-best-practices-in-mysql/">Top 10 best practices in MySQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>MySQL is the second most widely used open-source relational database management system in the world. It has become so popular because of its consistent fast performance, high reliability and ease of use. This article presents some of the best practices in MySQL.</p>
<p><strong>1. Always use proper datatype</strong></p>
<p>Use datatypes based on the nature of data. If you use irrelevant datatypes it may consume more space or may lead to errors.</p>
<p>Example: Using varchar (20) to store date time values instead of DATETIME datatype will lead to errors during date time related calculations and there is also possible case of storing invalid data.</p>
<p><strong>2. Use CHAR (1) over VARCHAR(1)</strong></p>
<p>If you are string a single character, use CHAR(1) instead of VARCHAR(1) because VARCHAR(1) will take extra byte to store information</p>
<p><strong>3. Use CHAR datatype to store only fixed length data</strong></p>
<p>Example: Using char(1000) instead of varchar(1000) will consume more space if the length of data is less than 1000</p>
<p><strong>4. Avoid using regional date formats</strong></p>
<p>When you use DATETIME or DATE datatype always use YYYY-MM-DD date format or ISO date format that suits your SQL Engine. Other regional formats like DD-MM-YYY, MM-DD-YYYY will not be stored properly.</p>
<p><strong>5. Index key columns</strong></p>
<p>Make sure to index the columns which are used in JOIN clauses so that the query returns the result fast.</p>
<p>If you use UPDATE statement that involves more than one table make sure that all the columns which are used to join the tables are indexed</p>
<p><strong>6. Do not use functions over indexed columns</strong></p>
<p>Using functions over indexed columns defeats the purpose of index. Suppose you want to get data where first two character of customer code is AK, do not write</p>
<p>SELECT columns FROM table WHERE left (customer_code,2)=&#8217;AK&#8217;</p>
<p>but rewrite it using</p>
<p>SELECT columns FROM table WHERE customer_code like &#8216;AK%&#8217;</p>
<p>which will make use of index which results to faster response time.</p>
<p><strong>7. Use SELECT * only if needed</strong></p>
<p>Do not just blindly use SELECT * in the code. If there are many columns in the table, all will get returned which will slow down the response time particularly if you send the result to a front end application.</p>
<p>Explicitly type out the column names which are actually needed.</p>
<p><strong>8. Use ORDER BY Clause only if needed</strong></p>
<p>If you want to show the result in front end application, let it ORDER the result set. Doing this in SQL may slow down the response time in the multi user environment.</p>
<p><strong>9. Choose proper Database Engine</strong></p>
<p>If you develop an application that reads data more often than writing (ex: search engine), choose MyISAM storage engine.</p>
<p>If you develop an application that writes data more often than reading (ex: real time bank transactions), choose INNODB storage engine.</p>
<p>Choosing wrong storage engine will affect the performance</p>
<p><strong>10. Use EXISTS clause wherever needed</strong></p>
<p>If you want to check the existence of data, do not use</p>
<p>If (SELECT count(*) from Table WHERE col=&#8217;some value&#8217;)&gt;0</p>
<p>instead, use EXISTS clause</p>
<p>If EXISTS(SELECT * from Table WHERE col=&#8217;some value&#8217;)</p>
<p>which is faster in response time.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/top-10-best-practices-in-mysql/">Top 10 best practices in MySQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/top-10-best-practices-in-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top Facebook groups for Analytics, Big Data, Data Mining, Hadoop, NoSQL, Data Science</title>
		<link>http://bigdata-madesimple.com/top-facebook-groups-for-analytics-big-data-data-mining-hadoop-nosql-data-science/</link>
		<comments>http://bigdata-madesimple.com/top-facebook-groups-for-analytics-big-data-data-mining-hadoop-nosql-data-science/#comments</comments>
		<pubDate>Thu, 24 Jul 2014 11:20:17 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=11679</guid>
		<description><![CDATA[<p>Facebook may not be a best place for professional, but like in Linkedin, it too has a good...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/top-facebook-groups-for-analytics-big-data-data-mining-hadoop-nosql-data-science/">Top Facebook groups for Analytics, Big Data, Data Mining, Hadoop, NoSQL, Data Science</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>Facebook may not be a best place for professional, but like in Linkedin, it too has a good number of Big Data groups/communities/public forums that function to spread knowledge about technologies used to mine, manage and analyse data for businesses. This is our elaborate list of Facebook groups for Analytics, Big Data, Data Mining, Hadoop, NoSQL, Data Science etc.</p>
<p>1. <a href="https://www.facebook.com/groups/data.analytics/" target="_blank">Analytics, Data Mining, Predictive Modeling, Artificial Intelligence</a></p>
<p>2. <a href="https://www.facebook.com/groups/158386177549436/" target="_blank">APACHE HADOOP</a></p>
<p>3. <a href="https://www.facebook.com/groups/hadoop.group/" target="_blank">Apache Hadoop Ecosystem</a></p>
<p>4. <a href="https://www.facebook.com/groups/BigDataisonline/" target="_blank">Big Data</a></p>
<p>5. <a href="https://www.facebook.com/groups/434352233255448/" target="_blank">Big Data Analytics using R</a></p>
<p>6. <a href="https://www.facebook.com/groups/rhadoop/" target="_blank">Big Data Analytics with R and Hadoop</a></p>
<p>7. <a href="https://www.facebook.com/groups/bigdatahadoop/" target="_blank">Big data hadoop NOSQL Hive Hbase</a></p>
<p>8. <a href="https://www.facebook.com/groups/bigdatalearnings/" target="_blank">Big Data Learnings</a></p>
<p>9. <a href="https://www.facebook.com/groups/bigdatamy/" target="_blank">Big Data Malaysia</a></p>
<p>10. <a href="https://www.facebook.com/groups/bigdatastatistics/" target="_blank">Big Data, Data Science, Data Mining &amp; Statistics</a></p>
<p>11. <a href="https://www.facebook.com/groups/BigDataExpert/" target="_blank">BigData/Hadoop Expert</a></p>
<p>12. <a href="https://www.facebook.com/groups/chennaihadoop/" target="_blank">Chennai Hadoop and Big Data User Group</a></p>
<p>13. <a href="https://www.facebook.com/groups/machinelearningforum/" target="_blank">Data Mining / Machine Learning / AI</a></p>
<p>14. <a href="https://www.facebook.com/groups/dataminingsocialnetworks/" target="_blank">Data Mining/Big Data</a></p>
<p>15. <a href="https://www.facebook.com/groups/hadoop.admins/" target="_blank">Hadoop Administrators</a></p>
<p>16. <a href="https://www.facebook.com/groups/423391947699826/" target="_blank">Hadoop Developers India</a></p>
<p>17. <a href="https://www.facebook.com/groups/haddopinaction/" target="_blank">Hadoop in Action</a></p>
<p>18. <a href="https://www.facebook.com/groups/hadoopjobs/" target="_blank">Hadoop Jobs</a></p>
<p>19. <a href="https://www.facebook.com/groups/416616701771842/" target="_blank">Hadoop Material</a></p>
<p>20. <a href="https://www.facebook.com/groups/hadoopcrunch/" target="_blank">Hadoop User Group</a></p>
<p>21. <a href="https://www.facebook.com/groups/309899905745489/" target="_blank">ORACLE SQL PlSQL Discussions</a></p>
<p>22. <a href="https://www.facebook.com/groups/AllaboutSQLServer/" target="_blank">SQL Server Network</a></p>
<p>23. <a href="https://www.facebook.com/groups/thesqlpunegroup7/" target="_blank">SQL Server Pune User Group</a></p>
<p>24. <a href="https://www.facebook.com/groups/SQLBangalore/" target="_blank">SQLBangalore</a></p>
<p>25. <a href="https://www.facebook.com/groups/thesqlgeeks/" target="_blank">SQLServerGeeks</a></p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/top-facebook-groups-for-analytics-big-data-data-mining-hadoop-nosql-data-science/">Top Facebook groups for Analytics, Big Data, Data Mining, Hadoop, NoSQL, Data Science</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/top-facebook-groups-for-analytics-big-data-data-mining-hadoop-nosql-data-science/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Comparing SQL databases and Hadoop</title>
		<link>http://bigdata-madesimple.com/comparing-sql-databases-and-hadoop/</link>
		<comments>http://bigdata-madesimple.com/comparing-sql-databases-and-hadoop/#comments</comments>
		<pubDate>Wed, 16 Jul 2014 12:26:13 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=11571</guid>
		<description><![CDATA[<p>Hadoop is a framework for processing data, what makes it better than standard relational databases. SQL (structured query...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/comparing-sql-databases-and-hadoop/">Comparing SQL databases and Hadoop</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>Hadoop is a framework for processing data, what makes it better than standard relational databases. SQL (structured query language) is by design targeted at structured data. Many of Hadoop’s initial applications deal with unstructured data such as text. From this perspective Hadoop provides a more general paradigm than SQL.</p>
<p>For working only with structured data, the comparison is more nuanced. In principle, SQL and Hadoop can be complementary, as SQL is a query language which can be implemented on top of Hadoop as the execution engine.</p>
<p>But in practice, SQL databases tend to refer to a whole set of legacy technologies, with several dominant vendors, optimized for a historical set of applications. Many of these existing commercial databases are a mismatch to the requirements that Hadoop targets.</p>
<p>With that in mind, let’s make a more detailed comparison of Hadoop with typical SQL databases on specific dimensions.</p>
<p><strong>Scale-out instead of scale-up</strong></p>
<p>Scaling commercial relational databases is expensive. Their design is more friendly to scaling up. To run a bigger database you need to buy a bigger machine. In fact, it’s not unusual to see server vendors market their expensive high-end machines as “database-class servers.”</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/comparing-sql-databases-and-hadoop/">Comparing SQL databases and Hadoop</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/comparing-sql-databases-and-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Calculate Confidence Intervals in SQL</title>
		<link>http://bigdata-madesimple.com/how-to-calculate-confidence-intervals-in-sql/</link>
		<comments>http://bigdata-madesimple.com/how-to-calculate-confidence-intervals-in-sql/#comments</comments>
		<pubDate>Sat, 10 May 2014 02:57:01 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=10174</guid>
		<description><![CDATA[<p>Imagine you have a small online business. This month 200 users signed up on your website, and 10...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/how-to-calculate-confidence-intervals-in-sql/">How to Calculate Confidence Intervals in SQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>Imagine you have a small online business. This month 200 users signed up on your website, and 10 of them bought your $800 service. Great! You&#8217;ve made $8k of income. How much should you expect to make this year?</p>
<p>The straightforward answer is $8k * 12 = $96k. But how confident should you be? Will your conversion rate always be so close to 5%? You could pad the estimate ±20% for safety, guessing at $77k to $115k. If $77k would cover all your expenses, should you feel secure?</p>
<p>This is a question of binomial probability. Using our favorite binomial confidence interval calculator, the 95% confidence interval for your conversion rate is about 2.5% to 9%.</p>
<p>With a confidence interval that wide, you should expect to make somewhere between $48k and $172k. Yikes! You could end up with half of your simple guess, and that&#8217;s if your business doesn&#8217;t change.</p>
<p>These confidence intervals are very informative, but turning to a calculator for every metric is tedious. If you&#8217;ve got hundreds of metrics across dozens of dashboards, it&#8217;s downright unsustainable.</p>
<p>Fortunately, the math for calculating confidence interval is simple to implement:</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/how-to-calculate-confidence-intervals-in-sql/">How to Calculate Confidence Intervals in SQL</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/how-to-calculate-confidence-intervals-in-sql/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Relational Vs Non-Relational databases – Part 2</title>
		<link>http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/</link>
		<comments>http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/#comments</comments>
		<pubDate>Tue, 06 May 2014 11:09:33 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://staging.bigdata-madesimple.com/?p=13169</guid>
		<description><![CDATA[<p>In my previous post, we have seen some fundamental differences between Relational and Non-Relational databases. In this post,...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/">Relational Vs Non-Relational databases – Part 2</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>In my <a href="http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-1/" target="_blank">previous post</a>, we have seen some fundamental differences between Relational and Non-Relational databases. In this post, let&#8217;s talk about Scalability of these two.</p>
<p><strong>Scalability</strong></p>
<p>It is an ability of a system that can easily accomadate the rapid incoming data without much performance problems. This is a main factor for any system to provide good scalability. There are two types of scaling methods known as Vertical and Horozontal scaling.</p>
<p><strong>Vertical Scaling</strong></p>
<p>All the Relational database tools support vertical scaling. This is the method of increasing the power of the system by adding additional CPU, memory and disk spaces. So to allow rapid incoming data, the single production server is optimsed to scale up. In this scaling technique there is always a single production server which can be connected by all the applications and users. A cluster environment can be created with some nodes and replicate the data across nodes. Because of ACID properties, all nodes should have the same set of data and data synchronization becomes complicated if there are serveral nodes in the clsuter. This is very optimised for Read scaling. Vertical scaling is also known as scale-up</p>
<p>The benefit of this scaling methodlogy is the tight integration of data and its consistency across the nodes in a cluster. All nodes will have the same set of data and If there is a problem with the production server, another node will automatically be connected by the applications. So this cluster is known as Fail-over cluster.</p>
<p><strong>Horizontal scaling</strong></p>
<p>All the Non-relational database tools support horizontal scaling. This is the method of adding more computers to the network to allow rapid incoming data. It is easy to add more nodes into the cluster to allow data growth. Data are split automatically and processed across nodes in a cluster. This is a distributed data environment. Hadoop Distributed File System (HDFS) is a classical example for this. Horizontal scaling is also known as Scale-out.</p>
<p>The benefit of this scaling technique is that since data are split and replicated across nodes, if any of the nodes goes offline, the application can still have the data from other nodes and this gurantees the availabilty of data at all the time. This method is very useful for the cases where no JOINs are required among the data of the nodes. This is also helpful in seperating data and having them in different geographical locations.</p>
<p>While both these scaling techniques have advantages and disadvantages, a good environment can mix both of these to have outstanding Scale-up and Scal-out. We can have a scale-up read and write database in a single server which requires ACID properties and have a scale-out distributed historical data across several nodes for data mining purpose.</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/">Relational Vs Non-Relational databases – Part 2</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/relational-vs-non-relational-databases-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can Microsoft&#8217;s SQL Server 2014 Compete with SAP HANA?</title>
		<link>http://bigdata-madesimple.com/can-microsofts-sql-server-2014-compete-with-sap-hana/</link>
		<comments>http://bigdata-madesimple.com/can-microsofts-sql-server-2014-compete-with-sap-hana/#comments</comments>
		<pubDate>Mon, 05 May 2014 09:26:45 +0000</pubDate>
		<dc:creator>Baiju NT</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.bigdata-madesimple.com/?p=10095</guid>
		<description><![CDATA[<p>CEO Satya Nadella took the lid off several new innovations from Microsoft with the launch of SQL Server...</p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/can-microsofts-sql-server-2014-compete-with-sap-hana/">Can Microsoft&#8217;s SQL Server 2014 Compete with SAP HANA?</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>CEO Satya Nadella took the lid off several new innovations from Microsoft with the launch of SQL Server 2014. But what’s got many people buzzing is the so-called &#8220;big data in a box&#8221; appliance that could help Microsoft compete against the likes of SAP.</p>
<p>SQL Server 2014 is the latest version of the industry’s most deployed database. After announcing the new software, Nadella shared the company’s path to deliver a platform for the next era of “ambient intelligence.&#8221; Ambient intelligence deals with electronic environments that are both sensitive and responsive to people’s presence. It’s one vision for the future of consumer electronics.
</p>
<p>“Developing the ability to convert data into the fuel for ambient intelligence is an ambitious challenge. It requires technology to understand context, derive intent and separate signal from noise,” said Nadella. “Building out a comprehensive platform that can enable this kind of ambient intelligence is a whole company initiative that we are uniquely qualified to undertake.”</p>
<p>The Benefits of Big Data</p>
<p>At the launch event, Nadella stressed the importance of a data culture &#8212; one that encourages curiosity, action and experimentation &#8212; for everyone and every organization. Also showcased were the results of a new IDC study that demonstrates the clear benefits for companies that take a comprehensive data approach. Specifically, these companies realize an additional 60 percent return on data assets. IDC figures it’s a $1.6 trillion opportunity worldwide. </p>
<p>The post <a rel="nofollow" href="http://bigdata-madesimple.com/can-microsofts-sql-server-2014-compete-with-sap-hana/">Can Microsoft&#8217;s SQL Server 2014 Compete with SAP HANA?</a> appeared first on <a rel="nofollow" href="http://bigdata-madesimple.com">Big Data Made Simple - One source. Many perspectives.</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bigdata-madesimple.com/can-microsofts-sql-server-2014-compete-with-sap-hana/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
