<tfoot id='QsqEH'></tfoot>

<small id='QsqEH'></small><noframes id='QsqEH'>

      <bdo id='QsqEH'></bdo><ul id='QsqEH'></ul>

  1. <i id='QsqEH'><tr id='QsqEH'><dt id='QsqEH'><q id='QsqEH'><span id='QsqEH'><b id='QsqEH'><form id='QsqEH'><ins id='QsqEH'></ins><ul id='QsqEH'></ul><sub id='QsqEH'></sub></form><legend id='QsqEH'></legend><bdo id='QsqEH'><pre id='QsqEH'><center id='QsqEH'></center></pre></bdo></b><th id='QsqEH'></th></span></q></dt></tr></i><div id='QsqEH'><tfoot id='QsqEH'></tfoot><dl id='QsqEH'><fieldset id='QsqEH'></fieldset></dl></div>
    1. <legend id='QsqEH'><style id='QsqEH'><dir id='QsqEH'><q id='QsqEH'></q></dir></style></legend>

    2. 使用基于另一列的GROUPBY的最小最大归一化来归一化一列数据帧

      Normalize a column of dataframe using min max normalization based on groupby of another column(使用基于另一列的GROUPBY的最小最大归一化来归一化一列数据帧)
        <legend id='NLwGy'><style id='NLwGy'><dir id='NLwGy'><q id='NLwGy'></q></dir></style></legend>
        <i id='NLwGy'><tr id='NLwGy'><dt id='NLwGy'><q id='NLwGy'><span id='NLwGy'><b id='NLwGy'><form id='NLwGy'><ins id='NLwGy'></ins><ul id='NLwGy'></ul><sub id='NLwGy'></sub></form><legend id='NLwGy'></legend><bdo id='NLwGy'><pre id='NLwGy'><center id='NLwGy'></center></pre></bdo></b><th id='NLwGy'></th></span></q></dt></tr></i><div id='NLwGy'><tfoot id='NLwGy'></tfoot><dl id='NLwGy'><fieldset id='NLwGy'></fieldset></dl></div>
        <tfoot id='NLwGy'></tfoot>

        <small id='NLwGy'></small><noframes id='NLwGy'>

          <tbody id='NLwGy'></tbody>
            <bdo id='NLwGy'></bdo><ul id='NLwGy'></ul>

              • 本文介绍了使用基于另一列的GROUPBY的最小最大归一化来归一化一列数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                数据帧如图所示

                Name     Job      Salary
                john   painter    40000
                peter  engineer   50000
                sam     plumber   30000
                john    doctor    500000
                john    driver    20000
                sam    carpenter  10000
                peter  scientist  100000
                

                如何按列名分组并对每个组上的薪资列应用规范化?

                预期结果:

                Name     Job      Salary
                john   painter    0.041666
                peter  engineer   0
                sam     plumber   1
                john    doctor    1
                john    driver    0
                sam    carpenter  0
                peter  scientist  1
                

                我已尝试以下操作

                data = df.groupby('Name').transform(lambda x: (x - x.min()) / x.max()- x.min())
                

                但是,这会产生

                         Salary
                0 -19999.960000
                1 -50000.000000
                2  -9999.333333
                3 -19999.040000
                4 -20000.000000
                5 -10000.000000
                6 -49999.500000
                

                推荐答案

                您马上就到了。

                >>> df                                                                                                                 
                    Name        Job  Salary
                0   john    painter   40000
                1  peter   engineer   50000
                2    sam    plumber   30000
                3   john     doctor  500000
                4   john     driver   20000
                5    sam  carpenter   10000
                6  peter  scientist  100000
                >>>                                                                                                                    
                >>> result = df.assign(Salary=df.groupby('Name').transform(lambda x: (x - x.min()) / (x.max()- x.min())))
                >>> # alternatively, df['Salary'] = df.groupby(... if you don't need a new frame       
                >>> result                                                                                                               
                    Name        Job    Salary
                0   john    painter  0.041667
                1  peter   engineer  0.000000
                2    sam    plumber  1.000000
                3   john     doctor  1.000000
                4   john     driver  0.000000
                5    sam  carpenter  0.000000
                6  peter  scientist  1.000000
                

                所以基本上,您只是忘了用括号将x.max() - x.min()括起来。


                请注意,使用一系列矢量化操作可以更快地完成此操作。

                >>> grouper = df.groupby('Name')['Salary']                                                                             
                >>> maxes = grouper.transform('max')                                                                                   
                >>> mins = grouper.transform('min')                                                                                    
                >>>                                                                                                                    
                >>> result = df.assign(Salary=(df.Salary - mins)/(maxes - mins))                                                       
                >>> result                                                                                                             
                    Name        Job    Salary
                0   john    painter  0.041667
                1  peter   engineer  0.000000
                2    sam    plumber  1.000000
                3   john     doctor  1.000000
                4   john     driver  0.000000
                5    sam  carpenter  0.000000
                6  peter  scientist  1.000000
                

                计时:

                >>> # Setup
                >>> df = pd.concat([df]*1000, ignore_index=True)                                                                       
                >>> df.Name = np.arange(len(df)//4).repeat(4) # 4 names per group                                                      
                >>> df                                                                                                                 
                      Name        Job  Salary
                0        0    painter   40000
                1        0   engineer   50000
                2        0    plumber   30000
                3        0     doctor  500000
                4        1     driver   20000
                ...    ...        ...     ...
                6995  1748    plumber   30000
                6996  1749     doctor  500000
                6997  1749     driver   20000
                6998  1749  carpenter   10000
                6999  1749  scientist  100000
                
                [7000 rows x 3 columns]
                >>>
                >>> # Tests @ i5-6200U CPU @ 2.30GHz
                >>> %timeit df.groupby('Name').transform(lambda x: (x - x.min()) / (x.max()- x.min()))                                 
                1.19 s ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                >>> %%timeit 
                ...: grouper = df.groupby('Name')['Salary'] 
                ...: maxes = grouper.transform('max') 
                ...: mins = grouper.transform('min') 
                ...: (df.Salary - mins)/(maxes - mins) 
                ...:  
                ...:                                                                                                                   
                3.04 ms ± 94.5 s per loop (mean ± std. dev. of 7 runs, 100 loops each)
                

                这篇关于使用基于另一列的GROUPBY的最小最大归一化来归一化一列数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                groupby multiple coords along a single dimension in xarray(在xarray中按单个维度的多个坐标分组)
                Group by and Sum in Pandas without losing columns(Pandas中的GROUP BY AND SUM不丢失列)
                Is there a way of group by month in Pandas starting at specific day number?( pandas 有从特定日期开始的按月分组的方式吗?)
                Group by + New Column + Grab value former row based on conditionals(GROUP BY+新列+基于条件的前一行抓取值)
                Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)
                Pandas - Group Rows based on a column and replace NaN with non-null values(PANAS-基于列对行进行分组,并将NaN替换为非空值)
                  <tfoot id='eToNV'></tfoot>
                    <bdo id='eToNV'></bdo><ul id='eToNV'></ul>

                      1. <legend id='eToNV'><style id='eToNV'><dir id='eToNV'><q id='eToNV'></q></dir></style></legend>

                        <small id='eToNV'></small><noframes id='eToNV'>

                          <i id='eToNV'><tr id='eToNV'><dt id='eToNV'><q id='eToNV'><span id='eToNV'><b id='eToNV'><form id='eToNV'><ins id='eToNV'></ins><ul id='eToNV'></ul><sub id='eToNV'></sub></form><legend id='eToNV'></legend><bdo id='eToNV'><pre id='eToNV'><center id='eToNV'></center></pre></bdo></b><th id='eToNV'></th></span></q></dt></tr></i><div id='eToNV'><tfoot id='eToNV'></tfoot><dl id='eToNV'><fieldset id='eToNV'></fieldset></dl></div>
                            <tbody id='eToNV'></tbody>