Big Data Analytics
Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to pro
Summary
Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to pro
Things to Remember
- Data scientists are the skilled personnel who can interpret data.
- The role of data scientist is data analytics.
- NoSQL takeover and Hadoop environment are an important current trend in big data.
MCQs
No MCQs found.
Subjective Questions
Q1:
Write a short note about Antacid.
Type: Short Difficulty: Easy
<p>2) Tell patient to chew the tablet before swallowing.</p>
<p>3) An antacid may impair absorption of other drugs if given at same time so other drugs should be given 1 hour before antacids.</p>
<p>4) Note number, frequency and consistency of stool.</p>
<p>5) Store drug in a cool place and avoid freezing.</p>
Videos
Antacid

Big Data Analytics
Role of data Scientist
The new job title data scientist is rising alongside the relatively new technology of big data. While not tied exclusively to big data projects, data scientist role does complement them because of the increased size of data being examined, as compared to traditional roles and methods.
So what does a data scientist do?
A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid knowledge of computer science and applications, modeling, analytics, statistics and math. The data scientist blend- strong business awareness or judgment, coupled with the ability to communicate findings to both business and IT leaders in a way that s/he can influence how an organization approaches a business challenge. A good data scientist will not just address business problems, s/he would pick the right problems that have the most value to the organization.
A data scientist is somebody who is inquisitive, who stare at data and can spot trends. It's almost like a Renaissance individual who really wants to learn and bring changes to an organization.
Whereas a traditional data analyst may look only at data from a single source a CRM(customer relationship management) system, for example – a data scientist most likely explore and examine data from multiple disparate sources. The data scientist examines through all incoming data with the goal of discovering a previously hidden insight, which can provide a competitive advantage or address a pressing business problem. A data scientist doesn't simply collect and report on data, but also looks it from various angles, determines what it means,and then recommends ways to apply the data.
Data scientists are inquisitive: exploring, asking questions, doing what-if analysis, questioning existing processes and assumptions etc. Armed with data and analytical results, a better data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.[4]

Data Skills you need to be a data scientist [4] :
1.Basic Tools: No matter what type of company you’re interviewing for, you’re likely going to be expected to know how to use the tools of the trade. This means a statistical programming language, like R or Python, and a database querying language like SQL.
2.Basic Statistics
3.Machine Learning
4.Multivariable Calculus and Linear Algebra
5.Data Munging:
Mostly data are messy and are really difficult to work with. So, its is important to know how to deal with its imperfections. Imperfections like missing values, inconsistent string formatting -e.g., ‘New York’ versus ‘new york’ versus ‘ny' and date formatting ‘2014-01-01’ vs. ‘01/01/2014’, Unix time vs. timestamps, etc. This skill is most important for small companies where you’re an early data hire, or data-driven companies where the product is not data-related(data cleanliness).
6.Data Visualization & Communication
7.Software Engineering
8.Thinking Like A Data Scientist: you need to be (data-driven) problem solver
Data scientist requires basically these kinds of skills
- Business domain expertise and strong analytical skills
- Creativity and good communications.
- Knowledgeable in statistics, machine learning and data visualization
- Able to develop data analysis solutions using modeling/analysis methods and languages such as Map-Reduce, R, SAS, etc.
- Adept at data engineering, including discovering and mashing/blending large amounts of data.
Data scientists use an investigative computing platform
- To bring unmodeled data.
- Multistructured data, into an investigative data store for experimentation.
- Deal with unstructured, semistructured and structured data from the various source.
Data scientist helps broaden the business scope of investigative computing in three areas:
- New sources of data : supports access to multi-structured data.
- New and improved analysis techniques : enables sophisticated analytical processing of multi-structured data using techniques such as Map-Reduce and in-database analytic functions.
- Improved data management and performance : provides improved price performance ratio for processing multi-structured data using non-relational systems such as Hadoop, relational DBMSs, and integrated hardware/software.
Goal of data analytics is the role of data scientist
- Recognize and reflect the two-phase nature of analytic processes.
- Provide guidance for companies about how to establish that their use of data for knowledge discovery is a legitimate business purpose.
- Emphasize the need to establish accountability through an internal privacy program that relies upon the identification and mitigation of the risks the use of data for analytics may raise for individuals.
- Take into account that analytics may be an iterative process using data from a variety of sources.
Current trend in big data analytics
- Iterative process (Discovery and Application) In general:
- Analyze the unstructured data (Data analytics)
- development of algorithm (Data analytics)
- Data Scrub (Data engineer)
- Present structured data (relationship, association)
- Data refinement (Data scientist)
- Process data using distributed engine. E.g. HDFS (S/W engineer) and write to No-SQL DB (Elasticsearch, Hbase, MangoDB, Cassandra, etc)
- Visual presentation in Application sw.
- QC verification.
- Client release.
- Deep Learning
- In-memory analytics: eg hybrid transaction/analytical processing (HTAP) — allowing transactions and analytic processing to reside in the same in-memory database
Top 8 trends for big data in 2016 [2]
- More, better 'NoSQL' takeover
- 'Apache Spark' lights up big data: Spark provides dramatically increased data processing speed compared to Hadoop and thus now the largest big data open-source project
- Hadoop projects mature: enterprises continue their move from Hadoop proof of concepts to production
- Big data grows up: Hadoop adds to enterprise standards
- Big data gets fast: options expand to add speed to Hadoop
- The number of options for ‘preparing’ ends users to discover all forms of data growth
- MPP data warehouse growth is heating up in the cloud
- The buzzwords converge: IoT, cloud and big data come together
References :
- "Demystifying Data Science: 4 jobs and 8 skills" at blog.udacity.com
- "big data trends" at computerworld.com
- "What is big data? at www.sas.com/en_th
- "Data Scientist" at www.ibm.com/analytics
- "Developing Analytic Talent:Becoming a Data scientist", Vincent Grantvilla, page 11
Lesson
Introduction to Big Data
Subject
Computer Engineering
Grade
Engineering
Recent Notes
No recent notes.
Related Notes
No related notes.