DATA PRODUCTS AND BASIC PYTHON FUNCTIONS


DATA PRODUCTS AND BASIC PYTHON FUNCTIONS

Simply speaking a data product is the output of any data science activity. But why do we need it? While the answer to this question varies, the commonality between all the reasons is to make the data help with taking actions to facilitate an end goal. 
A data product is a product that facilitates an end goal through the use of data. Great definition as it speaks to the heart of the reason why we would like to generate a data product in the first place, or the purpose of today the product activity to put insights to use in production so we can take actions. An API is a common data product that others who are building data products from the dataset used
Derived data is data after it has been cleaned and prepared for analysis

Data science potentially helps the service with the Recommender System to increase its sales by targeting the right consumer.
This can be modeled using data related to customer choices, profile, and location.
Once you define these objectives or generally speaking what are the opportunities for data products to build to turn data into an advantage for your business, you can look at what have, analyze the gaps, and prioritize the actions to get there. What are the privacy concerns? Who should have access to or control data related to this product? What's the lifetime of data? The lifetime of data sometimes is defined as volatility.  when building a data product strategy, it is important to integrate data collection and modeling with business objectives. 

CVS TSV & JSON

Where we had a fixed number of columns JSON comes in. If we have data that we need to represent using structures like lists and sets, that's going to be inconvenient to do using the CSV/ TSV format. JSON is going to try and address this by allowing for more general forms of structured data. JSON can hold complex data structures
WORKING WITH ZIPS
In order to stay with the zip and save some space, we have to import the gzip 
MANAGING NUMBERS

WHEN MAKING A FUNCTION
TOKENIZATION: The act of splitting a string into "tokens" by a defined delimiter
KEY ERROR: the key is not in the dictionary.

ABOUT DATES IN PYTHON
UNIX TIME are the number of seconds since January 1, 1970, in the UTC timezone
  • Time.strptime: time string --> structured time object
  • Time.strftime: structured time object --> time string
  • Time.mktime/calendar.timegm: structured time object --> number
  • Time.gmtime: number --> structured time object
 mktime() assumes it is local time, gmtime() assumes it is UTC time.
STORING DATA
DEFAULTDICT is a data structure we can store, first you have to import it
MAKING ARRAYS 
First you have to import the numpy package
Some useful functions 

B​eautifulSoup 
  • parses the HTML contents of a given webpage to extract desired text
  • requires minimal setup to use
  • BeautifulSoup can’t be used to traverse any HTML page we want to parse.
U​rllib 

  • helps us get the HTML contents of a webpage, while BeautifulSoup helps us parse HTML.

Comentarios

Entradas populares de este blog