这次数据集的基本信息:
数据集大小:2473条
平台: Shopee_TW
品类:美妝保健,居家生活,女生配件,寵物,女鞋,男鞋,女生包包-精品,男生包包與配件
筛选标准:月销量超过500的商品
汇率:1 台币 = 0.2174 RMB
数据集来源:shopee_tw_no_limit_price.csv
美妝保健,居家生活,女生配件,寵物,女鞋,男鞋,女生包包-精品,男生包包與配件这八大品类中,月销量超过500的商品中,
针对居家生活和美妝保健这两大品类进行小类分析
居家生活小类中日用品和收納占比较高,分别是 27.19%, 13.59%,其中日用品最多:292条商品数据。
美妝保健小类中開架流行清潔保養和私密清潔保養占比较高,分别是18.26%, 14.03%,其中開架流行清潔保養最多:134条商品数据。
针对集中区域(即价格0-200元之间,销量500-10000件之间)进行分析,即美妝保健-居家生活-女生配件-寵物四个品类,其他品类由于数据过少,不做分析。
美妝保健:价格集中在 0-30 元之间,销量集中在 500-2000 件之间
居家生活:价格集中在 0-60 元之间,销量集中在 500-4000 件之间
女生配件:价格集中在 0-10 元之间,销量集中在 500-2000 件之间
寵物:价格集中在 0-20 元之间,销量集中在 500-2000 件之间
** 其中,男鞋最高销量为 74518 件,该数据似乎异常。经核实,该条商品数据确实无效。 商品链接
预览数据集前10行数据
包括商品的品类,小类,商品名称,价格信息,价格低点,价格高点,月销量,商品链接,商品主图链接共 9 个字段
import numpy as np
import pandas as pd
data_file = 'shopee_tw_no_limit_price.csv'
data = pd.read_csv(data_file)
data.head(10)
import numpy as np
import pandas as pd
from PIL import Image
currency_rate = 0.2174
print('\n---------------数据集统计分析------------------\n')
fields = data.columns.tolist()
print('数据集来源:{}'.format(data_file))
print('正在统计分析数据集......')
print('数据集字段: {}'.format(fields))
category_name = data['category'].unique()
print('数据集大分类: {}'.format(category_name ))
print('数据集小分类: {}'.format(data['main class'].unique()))
data_section = data[['category','main class','price_max','monthly sales']]
data_name = ['价格','月销量','商品数量','最高销量','小分类']
var_name = ['beauty','houseware','girlaccessory','pets','womenshoes','menshoes','womenbags','menbags']
print('\n---------------类别统计分析-------------------\n')
createVar = locals()
for i in range(len(category_name)):
print('正在统计{}的数据: '.format(category_name[i]),'、'.join(data_name))
createVar[var_name[i]] = data_section[data_section['category'] == category_name[i]]
createVar[var_name[i]+'_pricelist']= createVar[var_name[i]]['price_max']* currency_rate
createVar[var_name[i]+'_saleslist']= createVar[var_name[i]]['monthly sales']
createVar[var_name[i]+'_num']= createVar[var_name[i]].count()
createVar[var_name[i]+'_salesmax']= max(createVar[var_name[i]+'_saleslist'])
createVar[var_name[i]+'_class']= createVar[var_name[i]]['main class'].unique()
createVar[var_name[i]+'_class_num']= [len(createVar[var_name[i]][createVar[var_name[i]]['main class'] == a]) for a in createVar[var_name[i]+'_class']]
createVar[var_name[i]+'_price_select'] = (data_section[(data_section['category'] == category_name[i]) & (data_section['price_max']*currency_rate<200)& (data_section['monthly sales']<10000)])['price_max']*currency_rate
createVar[var_name[i]+'_sales_select'] = (data_section[(data_section['category'] == category_name[i]) & (data_section['price_max']*currency_rate<200)& (data_section['monthly sales']<10000)])['monthly sales']
from pyecharts import Pie
from pyecharts import online
online()
name = category_name
value = [beauty_num,houseware_num,girlaccessory_num,pets_num,womenshoes_num,menshoes_num,womenbags_num,menbags_num]
pie = Pie('八大品类占比',title_pos='center',width='900')
pie.add('商品数量',name,value,radius=[40, 70],label_text_color=None,is_label_show=True,legend_orient="vertical",legend_pos="left")
pie
from pyecharts import WordCloud
wordcloud = WordCloud(width = 900,height = 300)
wordcloud.add('八大品类的商品数量占比',name,value,word_size_range = [20,60], shape='pentagon')
wordcloud
from pyecharts import Pie
pie = Pie('居家生活',title_pos='center',width='900')
pie.add('商品数量',houseware_class,houseware_class_num,radius=[40, 70],label_text_color=None,is_label_show=True,legend_orient="vertical",legend_pos="Left")
pie
from pyecharts import WordCloud
wordcloud = WordCloud(width = 900,height = 300)
wordcloud.add('',houseware_class,houseware_class_num,word_size_range = [20,60], shape='pentagon')
wordcloud
from pyecharts import Pie
pie = Pie('美妝保健',title_pos='center',width='900')
pie.add('商品数量',beauty_class,beauty_class_num,radius=[40, 70],label_text_color=None,is_label_show=True,legend_orient="vertical",legend_pos="Left")
pie
from pyecharts import WordCloud
wordcloud = WordCloud(width = 900,height = 400)
wordcloud.add('',beauty_class,beauty_class_num,word_size_range = [20,60], shape='diamond')
wordcloud
from pyecharts import Scatter
scatter = Scatter('价格&销量分布','八大品类',width='900')
for i in range(len(category_name)):
scatter.add(category_name[i],createVar[var_name[i]+'_pricelist'],createVar[var_name[i]+'_saleslist'],xaxis_name ='price (rmb)',yaxis_name = 'sales',yaxis_name_gap=50,legend_top='7%',yaxis_min=500)
scatter
八大品类
由于部分品类的数据过少,下面对美妝保健-居家生活-女生配件-寵物四个品类进行分析
from pyecharts import Timeline
createScat = locals()
for i in range(len(category_name))[:4]:
createScat['scatter_'+str(i)] = Scatter('集中区域的价格&销量分布: {}'.format(category_name[i]),'美妝保健-居家生活-女生配件-寵物',width='900')
createScat['scatter_'+str(i)].add(category_name[i],createVar[var_name[i]+'_price_select'],createVar[var_name[i]+'_sales_select'],yaxis_name ='sales',xaxis_name = 'price (rmb)',yaxis_name_gap=50,xaxis_name_gap=40,yaxis_min=500,legend_top='7%',)
timeline = Timeline(is_auto_play=True, timeline_bottom=-5)
for i in range(len(category_name))[:4]:
timeline.add(createScat['scatter_'+str(i)],'{}'.format(category_name[i]))
timeline
from pyecharts import Boxplot,Grid
boxplot_1 = Boxplot('价格峰值')
x_axis_1 = data['category'].unique()
y_axis_1 = [createVar[var_name[i]+'_pricelist'] for i in range(len(category_name))]
boxplot_1.add("价格", x_axis_1, boxplot.prepare_data(y_axis_1),xaxis_rotate = 30,legend_pos='20%')
boxplot_2 = Boxplot('销量峰值',title_pos='45%')
x_axis_2 = data['category'].unique()
y_axis_2 = [createVar[var_name[i]+'_saleslist'] for i in range(len(category_name))]
boxplot_2.add("销量", x_axis_2, boxplot.prepare_data(y_axis_2),xaxis_rotate = 30,legend_pos ='70%')
num = 0
for i in range(len(boxplot.prepare_data(y_axis_1))):
print(x_axis_1[num])
print("价格 --- min 最小值:{0[0]} / Q1下四分位数:{0[1]} / median(or Q2)中位数:{0[2]} / Q3上四分位数:{0[3]} / max最大值:{0[4]}".format(boxplot.prepare_data(y_axis_1)[i],x_axis_1))
print("销量 --- min 最小值:{0[0]} / Q1下四分位数:{0[1]} / median(or Q2)中位数:{0[2]} / Q3上四分位数:{0[3]} / max最大值:{0[4]}".format(boxplot.prepare_data(y_axis_2)[i],x_axis_2))
num += 1
grid = Grid(width=990)
grid.add(boxplot_1,grid_right='53%')
grid.add(boxplot_2,grid_left='53%')
grid
美妝保健,居家生活,女生配件,寵物,女鞋,男鞋,女生包包-精品,男生包包與配件
** 其中,男鞋最高销量为 74518 件,该数据似乎异常。经核实,该条商品数据确实无效,商品链接