Vi ste ovdje
Overview/Description
Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer. In this course, you'll learn about the theory of Flume as a tool for dealing with extraction and loading of unstructured data. You'll explore a detailed...
Overview/Description
Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. This course explains the theory of Sqoop as a tool for dealing with extraction and loading of structured data from a RDBMS. You'll explore an explanation of Hive SQL statements and a demonstration of Hive in action.
Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering...
Overview/Description
The core of Hadoop consists of a storage part, HDFS, and a processing part, MapReduce. Hadoop splits files into large blocks and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop and MapReduce transfer code to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation...
Overview/Description
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of Big Data on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are commonplace and thus should be automatically handled in software by the framework. In this course, you'll explore Hive as a SQL like tool for interfacing with Hadoop. The course demonstrates the installation and configuration of Hive, followed by demonstration of Hive in action. Finally, you'll learn about extracting and...
Overview/Description
Hadoop is an open source software for affordable supercomputing. It provides the distributed file system and the parallel processing required to run a massive computing cluster. This course explains Pig as a data flow scripting tool for interfacing with Hadoop. You'll learn about the installation and configuration of Pig and explore a demonstration of Pig in action.
Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or...
Overview/Description
As capabilities and capacities of modern computers continue to proliferate, data warehousing, and business intelligence more broadly, have become fundamental to maintaining business competitiveness. This course describes the concepts and methodologies involved in creating data warehouses and using them to extract business intelligence.
Target Audience
IT and business professionals with an interest in harvesting business intelligence from SQL Server databases
Prerequisites
None
Expected Duration (hours)
2.0
Lesson Objectives
Microsoft SQL Server Data...
Overview/Description
R is a free software environment for statistical computing and graphics and has become an important tool in modern data science. In this course, you will learn the fundamental R methods that data scientists use in their everyday work.
Target Audience
Individuals with statistics and programming experience who wish to learn the methods of data science in R.
Prerequisites
None
Expected Duration (hours)
2.0
Lesson Objectives
Fundamental Methods for Data Science in R
start the course
distinguish data science from statistics and computer science...
Overview/Description
R is a free software environment for statistical computing and graphics and has become an important tool in modern data science. In this course, you will learn the essential R machine learning methods that data scientists use in their everyday work.
Target Audience
Individuals with some statistics, programming, and machine learning experience who wish to learn machine learning methods in R used in data science in R
Prerequisites
None
Expected Duration (hours)
2.0
Lesson Objectives
Machine Learning Examples for Data Science in R
start the course...
Overview/Description
Administrating a MongoDB database requires ensuring queries are responding fast enough for users to obtain the correct data from the database. It also requires ensuring the data is available to the users by implementing replication and making sure it is correct. In this course, you'll learn how to create indexes and perform query optimization. You'll also learn how to configure replication and security.
Target Audience
Anyone planning or considering to deploy a Mongo database backend
Prerequisites
None
Expected Duration (hours)
1.5
Lesson Objectives...
Overview/Description
MongoDB is an open-source documentation database that is easy to scale and develop. MongoDB can be installed on various operating systems with minimum requirements and provides CRUD operations to read and write documents within collections. This course will discuss the concepts of MongoDB and demonstrate how to install it on Linux and Windows. It will also demonstrate how to create and manage documents as well as how to query them using the find() method and aggregation.
Target Audience
Anyone planning or considering to deploy a Mongo database backend...