• <i id='2yI27'><tr id='2yI27'><dt id='2yI27'><q id='2yI27'><span id='2yI27'><b id='2yI27'><form id='2yI27'><ins id='2yI27'></ins><ul id='2yI27'></ul><sub id='2yI27'></sub></form><legend id='2yI27'></legend><bdo id='2yI27'><pre id='2yI27'><center id='2yI27'></center></pre></bdo></b><th id='2yI27'></th></span></q></dt></tr></i><div id='2yI27'><tfoot id='2yI27'></tfoot><dl id='2yI27'><fieldset id='2yI27'></fieldset></dl></div>

      <tfoot id='2yI27'></tfoot>

      <small id='2yI27'></small><noframes id='2yI27'>

    1. <legend id='2yI27'><style id='2yI27'><dir id='2yI27'><q id='2yI27'></q></dir></style></legend>
      • <bdo id='2yI27'></bdo><ul id='2yI27'></ul>

        如何在 Pandas 数据框中展开列

        How to spread a column in a Pandas data frame(如何在 Pandas 数据框中展开列)
            <tbody id='3REdN'></tbody>
          1. <tfoot id='3REdN'></tfoot>

            • <small id='3REdN'></small><noframes id='3REdN'>

              <i id='3REdN'><tr id='3REdN'><dt id='3REdN'><q id='3REdN'><span id='3REdN'><b id='3REdN'><form id='3REdN'><ins id='3REdN'></ins><ul id='3REdN'></ul><sub id='3REdN'></sub></form><legend id='3REdN'></legend><bdo id='3REdN'><pre id='3REdN'><center id='3REdN'></center></pre></bdo></b><th id='3REdN'></th></span></q></dt></tr></i><div id='3REdN'><tfoot id='3REdN'></tfoot><dl id='3REdN'><fieldset id='3REdN'></fieldset></dl></div>
              <legend id='3REdN'><style id='3REdN'><dir id='3REdN'><q id='3REdN'></q></dir></style></legend>

                  <bdo id='3REdN'></bdo><ul id='3REdN'></ul>
                • 本文介绍了如何在 Pandas 数据框中展开列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  I have the following pandas data frame:

                  import pandas as pd
                  import numpy as np
                  df = pd.DataFrame({
                                 'fc': [100,100,112,1.3,14,125],
                                 'sample_id': ['S1','S1','S1','S2','S2','S2'],
                                 'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c'],
                                 })
                  
                  df = df[['gene_symbol', 'sample_id', 'fc']]
                  df
                  

                  Which produces this:

                  Out[11]:
                    gene_symbol sample_id     fc
                  0           a        S1  100.0
                  1           b        S1  100.0
                  2           c        S1  112.0
                  3           a        S2    1.3
                  4           b        S2   14.0
                  5           c        S2  125.0
                  

                  How can I spread sample_id so that in the end I get this:

                  gene_symbol    S1   S2
                  a             100   1.3
                  b             100   14.0
                  c             112   125.0
                  

                  解决方案

                  Use pivot or unstack:

                  #df = df[['gene_symbol', 'sample_id', 'fc']]
                  df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')
                  print (df)
                  sample_id       S1     S2
                  gene_symbol              
                  a            100.0    1.3
                  b            100.0   14.0
                  c            112.0  125.0
                  


                  df = df.set_index(['gene_symbol','sample_id'])['fc'].unstack(fill_value=0)
                  print (df)
                  sample_id       S1     S2
                  gene_symbol              
                  a            100.0    1.3
                  b            100.0   14.0
                  c            112.0  125.0
                  

                  But if duplicates, need pivot_table or aggregate with groupby or , mean can be changed to sum, median, ...:

                  df = pd.DataFrame({
                                 'fc': [100,100,112,1.3,14,125, 100],
                                 'sample_id': ['S1','S1','S1','S2','S2','S2', 'S2'],
                                 'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c', 'c'],
                                 })
                  print (df)
                        fc gene_symbol sample_id
                  0  100.0           a        S1
                  1  100.0           b        S1
                  2  112.0           c        S1
                  3    1.3           a        S2
                  4   14.0           b        S2
                  5  125.0           c        S2 <- same c, S2, different fc
                  6  100.0           c        S2 <- same c, S2, different fc
                  

                  df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')
                  

                  ValueError: Index contains duplicate entries, cannot reshape

                  df = df.pivot_table(index='gene_symbol',columns='sample_id',values='fc', aggfunc='mean')
                  print (df)
                  sample_id       S1     S2
                  gene_symbol              
                  a            100.0    1.3
                  b            100.0   14.0
                  c            112.0  112.5
                  


                  df = df.groupby(['gene_symbol','sample_id'])['fc'].mean().unstack(fill_value=0)
                  print (df)
                  sample_id       S1     S2
                  gene_symbol              
                  a            100.0    1.3
                  b            100.0   14.0
                  c            112.0  112.5
                  

                  EDIT:

                  For cleaning set columns name to None and reset_index:

                  df.columns.name = None
                  df = df.reset_index()
                  print (df)
                    gene_symbol     S1     S2
                  0           a  100.0    1.3
                  1           b  100.0   14.0
                  2           c  112.0  112.5
                  

                  这篇关于如何在 Pandas 数据框中展开列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Adding config modes to Plotly.Py offline - modebar(将配置模式添加到 Plotly.Py 离线 - 模式栏)
                  Plotly: How to style a plotly figure so that it doesn#39;t display gaps for missing dates?(Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙?)
                  python save plotly plot to local file and insert into html(python将绘图保存到本地文件并插入到html中)
                  Plotly: What color cycle does plotly express follow?(情节:情节表达遵循什么颜色循环?)
                  How to save plotly express plot into a html or static image file?(如何将情节表达图保存到 html 或静态图像文件中?)
                  Plotly: How to make a line plot from a pandas dataframe with a long or wide format?(Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?)
                  <tfoot id='UNbol'></tfoot>
                    <tbody id='UNbol'></tbody>
                    <i id='UNbol'><tr id='UNbol'><dt id='UNbol'><q id='UNbol'><span id='UNbol'><b id='UNbol'><form id='UNbol'><ins id='UNbol'></ins><ul id='UNbol'></ul><sub id='UNbol'></sub></form><legend id='UNbol'></legend><bdo id='UNbol'><pre id='UNbol'><center id='UNbol'></center></pre></bdo></b><th id='UNbol'></th></span></q></dt></tr></i><div id='UNbol'><tfoot id='UNbol'></tfoot><dl id='UNbol'><fieldset id='UNbol'></fieldset></dl></div>

                    1. <legend id='UNbol'><style id='UNbol'><dir id='UNbol'><q id='UNbol'></q></dir></style></legend>

                          <small id='UNbol'></small><noframes id='UNbol'>

                            <bdo id='UNbol'></bdo><ul id='UNbol'></ul>