Skip to contents

A dataset containing features about articles published by Mashable in a period of two years from January 7, 2013, to January 7, 2015. The purpose of collecting the data was to predict the number of shares of the news articles on social networks. Compared to the source, seven features are excluded. Two (url and a timestamp) are not relevant predictors and five are redundant or almost redundant, leading to collinearity.

Usage

news

Format

A data frame with 39,644 rows and 54 columns:

n_tokens_title

Number of words in the title

n_tokens_content

Number of words in the content

n_unique_tokens

Rate of unique words in the content

num_hrefs

Number of links

num_self_hrefs

Number of links to other articles published by Mashable

num_imgs

Number of images

num_videos

Number of videos

average_token_length

Average length of the words in the content

num_keywords

Number of keywords in the metadata

data_channel_is_lifestyle

Is data channel 'Lifestyle'?

data_channel_is_entertainment

Is data channel 'Entertainment'?

data_channel_is_bus

Is data channel 'Business'?

data_channel_is_socmed

Is data channel 'Social Media'?

data_channel_is_tech

Is data channel 'Tech'?

data_channel_is_world

Is data channel 'World'?

kw_min_min

Worst keyword (min. shares)

kw_max_min

Worst keyword (max. shares)

kw_avg_min

Worst keyword (avg. shares)

kw_max_max

Best keyword (max. shares)

kw_avg_max

Best keyword (avg. shares)

kw_min_avg

Avg. keyword (min. shares)

kw_max_avg

Avg. keyword (max. shares)

kw_avg_avg

Avg. keyword (avg. shares)

self_reference_min_shares

Min. shares of referenced articles in Mashable

self_reference_avg_sharess

Avg. shares of referenced articles in Mashable

weekday_is_monday

Was the article published on a Monday?

weekday_is_tuesday

Was the article published on a Tuesday?

weekday_is_wednesday

Was the article published on a Wednesday?

weekday_is_thursday

Was the article published on a Thursday?

weekday_is_friday

Was the article published on a Friday?

weekday_is_saturday

Was the article published on a Saturday?

weekday_is_sunday

Was the article published on a Sunday?

LDA_00

Closeness to LDA topic 0

LDA_01

Closeness to LDA topic 1

LDA_02

Closeness to LDA topic 2

LDA_03

Closeness to LDA topic 3

LDA_04

Closeness to LDA topic 4

global_subjectivity

Text subjectivity

global_sentiment_polarity

Text sentiment polarity

global_rate_positive_words

Rate of positive words in the content

global_rate_negative_words

Rate of negative words in the content

rate_positive_words

Rate of positive words among non-neutral tokens

rate_negative_words

Rate of negative words among non-neutral tokens

avg_positive_polarity

Avg. polarity of positive words

min_positive_polarity

Min. polarity of positive words

max_positive_polarity

Max. polarity of positive words

avg_negative_polarity

Avg. polarity of negative words

min_negative_polarity

Min. polarity of negative words

max_negative_polarity

Max. polarity of negative words

title_subjectivity

Title subjectivity

title_sentiment_polarity

Title polarity

abs_title_subjectivity

Absolute subjectivity level

abs_title_sentiment_polarity

Absolute polarity level

shares

Number of shares (target)