Search

14 July, 2019

Introduction to BIG DATA: What is, Types, Characteristics & Example

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Examples Of Big Data

Following are some the examples of Big Data-
The New York Stock Exchange generates about one terabyte of new trade data per day.
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

Types Of Big Data

BigData' could be found in three forms:
  1. Structured
  2. Unstructured
  3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is given and imagine the challenges involved in its storage and processing.
Do you know? Data stored in a relational database management system is one example of a 'structured' data.
Examples Of Structured Data
An 'Employee' table in a database is an example of Structured Data
Employee_ID Employee_Name Gender Department Salary_In_lacs
2365 Rajesh Kulkarni Male Finance650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format.
Examples Of Un-structured Data
The output returned by 'Google Search'

Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.
Examples Of Semi-structured Data
Personal data stored in an XML file-
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Data Growth over the years
Please note that web application data, which is unstructured, consists of log files, transaction history files etc. OLTP systems are built to work with structured data wherein data is stored in relations (tables).

Characteristics Of Big Data

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data.
(ii) Variety – The next aspect of Big Data is its variety.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.
(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-
    • Businesses can utilize outside intelligence while taking decisions
Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.
    • Improved customer service
Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.
    • Early identification of risk to the product/services, if any
    • Better operational efficiency
Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data.

Summary

  • Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
  • Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.
  • Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
  • Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
  • Improved customer service, better operational efficiency, Better Decision Making are few advantages of Bigdata

01 November, 2017

Agile project management

1. Agile project management focuses on continuous improvement, scope flexibility, team input, and delivering essential quality products. Agile project management approaches include scrum as a framework, extreme programming (XP) for building in quality upfront, and lean thinking to eliminate waste.


2. Agile project management is an iterative approach to managing software development projects that focuses on continuous releases and incorporating customer feedback with every iteration.
Traditional agile project management can be categorized into two frameworks: scrum and kanban. While scrum is focused on fixed-length project iterations, kanban is focused on continuous releases. Upon completion, the team immediately moves on to the next. 


How scrum works

Scrum is a framework for agile project management that uses fixed-length iterations of work, called sprints. There are four ceremonies that bring structure to each sprint.

SPRINT PLANNING

A team planning meeting that determines what to complete in the coming sprint.

SPRINT DEMO

 A sharing meeting where the team shows what they've shipped in that sprint.

DAILY SCRUM

Also known as a stand-up, a 15-minute mini-meeting for the software team to sync.

RETROSPECTIVE

A review of what did and didn't go well with actions to make the next sprint better.
How kanban works
Kanban is a framework for agile project management that matches the work to the team's capacity. It's focused on getting things done as fast as possible, giving teams the ability to react to change even faster than scrum.
The kanban framework includes the following four components:

LIST OF WORK 
(OR STORIES)

List of work, or stories, are defined as issues or tasks that need to get done.

WORK IN PROGRESS
(WIP) LIMITS

A rule to limit the amount of work to be done based on the team's capacity. 

WORK IN PROGRESS
(WIP) LIMITS

A rule to limit the amount of work to be done based on the team's capacity. 

CONTINUOUS RELEASES

The team works on the amount of stories within the WIP limit and can release at anytime.

Agile project estimating

Project estimating is an extremely important aspect of both kanban and scrum project management. For kanban, many teams set their WIP limit for each state based on their previous experiences and team size. Scrum teams use project estimating to identify how much work can be done in a particular sprint. Many agile teams adopt unique estimating techniques like planning poker, ideal hours, or story points to determine a numeric value for the task at hand.

Agile reporting

Project estimations come into play at the beginning and end of each sprint. They help teams determine what they can get done at the beginning of the sprint, but also show how accurate those initial estimates were at the end. 
  1. Software Firm/team Advantages of using Agile: 
  • Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  • Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
  • Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  • Business people and developers must work together daily throughout the project.
  • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation and a big community of agile practitioners with whom you can share knowledge.
  • You can detect and fix issues and defects faster
  • Working software is the primary measure of progress.
  • Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  • Continuous attention to technical excellence and good design enhances agility.
  • Simplicity — the art of maximizing the amount of work not done — is essential.
  • The best architectures, requirements, and designs emerge from self-organizing teams.
  • Developers can improve their coding skills based on QA feedback.
  • You can experiment and test ideas because it costs are low

      1. Disadvantages of agile project management
      • Documentation tends to get sidetracked, which makes it harder for new members to get up to speed
      • It’s more difficult to measure progress than in waterfall because progress happens across several cycles
      • Agile demands more time and energy from everyone because developers and customers must constantly interact with each other
      • When developers run out of work, they can’t work on a different project because they’ll be needed soon
      • Projects can become ever-lasting because there’s no clear end of that system.
      • Clients who work on a specified budget or schedule can’t know how much the project will actually cost, which makes for a very complicated sales cycle (until iteration ends is not something clients like to hear)
      • Teams can get sidetracked into delivering new functionality at the expense of technical debt, which increases the amount of unplanned work
      • Features that are too big to fit into one or even several cycles are avoided because they don’t fit in nicely into the philosophy
      • You need a long term vision for the product and actively work on communicating it
      • Products lack cohesion and the user journey is fragmented because the design is fragmented. The more time passes, the more disjointed software ends up.
      • Short cycles don’t leave enough time for the design thinking process so designers have to redevelop the experience over and over due to negative feedback.


      26 October, 2017

      C Program to Find LCM and HCF of Two Numbers