Big Data
Big Data is a term given to the data of very large size. This data could be structured or unstructured. Handling such data require special tools and techniques. Problems related to handling such data can be classified in 3 types-
Volume(Data at rest)- When the problem we are solving is related to how we would store such huge data. E.g.- Web logs
Velocity(Data in Motion)- Handling many requests per second. Ex. Google search
Variety(Data in many forms)-Processing of complex data. e.g. Recommendations
What are the classifications of Data?
Data Classification
โขStructured- If we know the fields as well as their datatype, then we call it structured. The data in relational databases such as MySQL, Oracle or Microsoft SQL is an example of structured data.
โขSemi-Structured- The data in which we know the fields or columns but we do not know the datatypes, we call it semi-structured data. For example, data in CSV which is comma separated values is known as semi-structured data.
โขUnstructured- If our data doesn’t contain columns or fields, we call it unstructured data. The data in the form of plain text files or logs generated on a server are examples of unstructured data.
ETL
The process of translating unstructured data into structured is known as ETL – Extract, Transform and Load.
Leave a Reply