What is big data

What is Big Data?

Big Data

Big Data is a term given to the data of very large size. This data could be structured or unstructured. Handling such data require special tools and techniques. Problems related to handling such data can be classified in 3 types-

Volume(Data at rest)- When the problem we are solving is related to how we would store such huge data. E.g.- Web logs

Velocity(Data in Motion)- Handling many requests per second. Ex. Google search

Variety(Data in many forms)-Processing of complex data. e.g. Recommendations

What are the classifications of Data?

Data Classification

โ€ขStructured- If we know the fields as well as their datatype, then we call it structured. The data in relational databases such as MySQL, Oracle or Microsoft SQL is an example of structured data.

โ€ขSemi-Structured- The data in which we know the fields or columns but we do not know the datatypes, we call it semi-structured data. For example, data in CSV which is comma separated values is known as semi-structured data.

โ€ขUnstructured- If our data doesn’t contain columns or fields, we call it unstructured data. The data in the form of plain text files or logs generated on a server are examples of unstructured data.

ETL

The process of translating unstructured data into structured is known as ETL – Extract, Transform and Load.

Comments

Leave a Reply