python – Pandas：根据来自另一列的匹配替换列值

Python 2019-04-30

我在第一个数据框df1 [“ItemType”]中有一列,如下所示,

Dataframe1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

我需要根据从另一个数据框创建的字典替换它.

Dataframe2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

请注意,

>在dataframe2中,某些ItemType与ItemType中的相同
dataframe1.
> dataframe2中的某些ItemType具有nullCabbage等空值.
> dataframe2中的ItemType与dataframe中的ItemType无关

如果相应的Dataframe2 ItemType中的值匹配,我需要替换Dataframe1 ItemType列中的值,newType保持在bullet-points中列出的异常之上.
如果没有匹配,那么值必须是[无变化].

到目前为止,我得到了.

import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType","newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

下面给出的匹配替换不起作用.他们正在插入NaN值而不是实际值.这些是基于SO的讨论here.

df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

要么

df1['ItemType']=df2['ItemType'].map(df2)

提前致谢

编辑
两个数据框中的两个列标题具有不同的名称.因此,dataframe1列是ItemType1,第二个数据帧中的第一列是ItemType2.错过了第一次编辑.

解决方法

使用地图

您需要的所有逻辑：

def update_type(t1,t2,dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

让’ItemType2’成为Dataframe2的索引

update_type(Dataframe1.ItemType1,Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1,dtype: object

update_type(Dataframe1.ItemType1,Dataframe2.set_index('ItemType2').newType,dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1,dtype: object

校验

updated = update_type(Dataframe1.ItemType1,Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1,updated],axis=1,keys=['old','new'])

定时

def root(Dataframe1,Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1,Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1,t2)

爬虫实战：探索XPath爬虫技巧之热榜新闻

在这篇文章中，我们深入学习了XPath作为一种常见的网络爬虫技巧。XPath是一种用于定位和选择XML文档中特...

谁说后端不能画出美丽的动图？让我来给大家拜个年！

祝福大家龙年快乐！愿你们的生活像龙一样充满力量和勇气，愿你们在新的一年里，追逐梦想，勇往直前，不...

爬虫实战：从网页到本地，如何轻松实现小说离线阅读

今天在爬虫实战中，除了正常爬取网页数据外，我们还添加了一个下载功能，主要任务是爬取小说并将其下载...

爬虫实战+数据分析：全国消费支出分析及未来预测

完美收官，本文是爬虫实战的最后一章了，所以尽管本文着重呈现爬虫实战，但其中有一大部分内容专注于数...

Java开发者的Python进修指南：JSON利器之官方json库、demjson和orjson的实用指南

JSON是一种流行的数据传输格式，Python中有多种处理JSON的方式。官方的json库是最常用的，它提供了简单...

Java开发者的Python快速进修指南：掌握T检验

独立样本T检验适用于比较两组独立样本的均值差异，而配对T检验则适用于比较同一组样本在不同条件下的均...

python – Pandas：根据来自另一列的匹配替换列值

解决方法

相关文章