Popular Blogs

K-Nearest Neighbor Machine Learning algorithm

The German credit dataset can be downloaded from UC Irvine, Machine learning community to indicate the predicted outcome if the loan applicant defaulted or not. Applying the logistic regression with three variables duration, amount, and installment, K-means classification, and K-Nearest Neighbor machine learning algorithm. # Logistic regression # Load the file from the hard disk after setting the work directory germandata # Print dataset to see the pattern of the data g...

Read more

Brace yourself for weathering the data storm: RDBMS

RDBMS is relational database management system. In the early 1970s, Codd invented the relational database management system. It was an advancement to spur DBMS movement they had at that time by implementing the cardinality and normalization to the database. Codd conceptualized and created 12 rules for a traditional RDBMS. Though the rules are laid out, it made the database more flexible and integrated with these principles. (a) RDBMS deems all database management ...

Read more

The Architecture of Apache Hadoop

Apache Hadoop platform is highly fault-tolerant for system disasters. At the core of Apache Hadoop, there are Hadoop Distributed File System and MapReduce that diffuse high-velocity streams of big data across multiple racks of low-priced servers. Hadoop Distributed File System does not have a limitation on the size of the file for data storage, write, and read operations. The limitation can only arise from the disk capacity of the machine, but not from HDFS. HDFS also...

Read more

R has knives out for IBM SPSS and SAS

Introduction ​ Originally Bell Labs has conceived the idea of language S in the mid-1970s to resolve data analytics and statistical conundrums. The purpose of the implementation project was to perform statistical analysis of their corporation leveraging the libraries of Fortran language. The invention of S language did not include the functions needed for statistical computing. In the late 1980s, the act of rebuilding the source code in language C reinvented S languag...

Read more

Splunk in the Age of the Big Data

Introduction Splunk big data tool works with both structured and unstructured data. Splunk covers a range of pre-configured rapid deployment packages for ease of big data content analysis and readily pluggable for corporations to slice and dice their visualization and turn the data into valuable decision-making actionable insights for achieving operational excellence in business intelligence competency centers. Similar to Tableau and QlicView, Splunk has powerful search...

Read more