1. <tfoot id='RXkIJ'></tfoot>

      <i id='RXkIJ'><tr id='RXkIJ'><dt id='RXkIJ'><q id='RXkIJ'><span id='RXkIJ'><b id='RXkIJ'><form id='RXkIJ'><ins id='RXkIJ'></ins><ul id='RXkIJ'></ul><sub id='RXkIJ'></sub></form><legend id='RXkIJ'></legend><bdo id='RXkIJ'><pre id='RXkIJ'><center id='RXkIJ'></center></pre></bdo></b><th id='RXkIJ'></th></span></q></dt></tr></i><div id='RXkIJ'><tfoot id='RXkIJ'></tfoot><dl id='RXkIJ'><fieldset id='RXkIJ'></fieldset></dl></div>

        <bdo id='RXkIJ'></bdo><ul id='RXkIJ'></ul>
      1. <small id='RXkIJ'></small><noframes id='RXkIJ'>

        <legend id='RXkIJ'><style id='RXkIJ'><dir id='RXkIJ'><q id='RXkIJ'></q></dir></style></legend>

        PySpark-从值列表中添加列

        PySpark - Adding a Column from a list of values(PySpark-从值列表中添加列)

        <tfoot id='f0iLV'></tfoot><legend id='f0iLV'><style id='f0iLV'><dir id='f0iLV'><q id='f0iLV'></q></dir></style></legend>

            <small id='f0iLV'></small><noframes id='f0iLV'>

            <i id='f0iLV'><tr id='f0iLV'><dt id='f0iLV'><q id='f0iLV'><span id='f0iLV'><b id='f0iLV'><form id='f0iLV'><ins id='f0iLV'></ins><ul id='f0iLV'></ul><sub id='f0iLV'></sub></form><legend id='f0iLV'></legend><bdo id='f0iLV'><pre id='f0iLV'><center id='f0iLV'></center></pre></bdo></b><th id='f0iLV'></th></span></q></dt></tr></i><div id='f0iLV'><tfoot id='f0iLV'></tfoot><dl id='f0iLV'><fieldset id='f0iLV'></fieldset></dl></div>

                  <tbody id='f0iLV'></tbody>
                • <bdo id='f0iLV'></bdo><ul id='f0iLV'></ul>
                  本文介绍了PySpark-从值列表中添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我必须根据值列表将列添加到PySpark DataFrame。

                  a= spark.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],["Animal", "Enemy"])
                  

                  我有一个名为Rating的列表,它是对每只宠物的评级。

                  rating = [5,4,1]
                  

                  我需要向数据帧追加一个名为Rating的列,以便

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Dog|  Cat|     5|
                  |   Cat|  Dog|     4|
                  | Mouse|  Cat|     1|
                  +------+-----+------+
                  

                  我执行了以下操作,但它只返回评级列中列表中的第一个值

                  def add_labels():
                      return rating.pop(0)
                  
                  labels_udf = udf(add_labels, IntegerType())
                  
                  new_df = a.withColumn('Rating', labels_udf()).cache()
                  

                  输出:

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Dog|  Cat|     5|
                  |   Cat|  Dog|     5|
                  | Mouse|  Cat|     5|
                  +------+-----+------+
                  

                  推荐答案

                  from pyspark.sql.functions import monotonically_increasing_id, row_number
                  from pyspark.sql import Window
                  
                  #sample data
                  a= sqlContext.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],
                                                 ["Animal", "Enemy"])
                  a.show()
                  
                  #convert list to a dataframe
                  rating = [5,4,1]
                  b = sqlContext.createDataFrame([(l,) for l in rating], ['Rating'])
                  
                  #add 'sequential' index and join both dataframe to get the final result
                  a = a.withColumn("row_idx", row_number().over(Window.orderBy(monotonically_increasing_id())))
                  b = b.withColumn("row_idx", row_number().over(Window.orderBy(monotonically_increasing_id())))
                  
                  final_df = a.join(b, a.row_idx == b.row_idx).
                               drop("row_idx")
                  final_df.show()
                  

                  输入:

                  +------+-----+
                  |Animal|Enemy|
                  +------+-----+
                  |   Dog|  Cat|
                  |   Cat|  Dog|
                  | Mouse|  Cat|
                  +------+-----+
                  

                  输出为:

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Cat|  Dog|     4|
                  |   Dog|  Cat|     5|
                  | Mouse|  Cat|     1|
                  +------+-----+------+
                  

                  这篇关于PySpark-从值列表中添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Split a Pandas column of lists into multiple columns(将 Pandas 的列表列拆分为多列)
                  How does the @property decorator work in Python?(@property 装饰器在 Python 中是如何工作的?)
                  What is the difference between old style and new style classes in Python?(Python中的旧样式类和新样式类有什么区别?)
                  How to break out of multiple loops?(如何打破多个循环?)
                  How to put the legend out of the plot(如何将传说从情节中剔除)
                  Why is the output of my function printing out quot;Nonequot;?(为什么我的函数输出打印出“无?)
                    • <bdo id='SihOk'></bdo><ul id='SihOk'></ul>
                      <i id='SihOk'><tr id='SihOk'><dt id='SihOk'><q id='SihOk'><span id='SihOk'><b id='SihOk'><form id='SihOk'><ins id='SihOk'></ins><ul id='SihOk'></ul><sub id='SihOk'></sub></form><legend id='SihOk'></legend><bdo id='SihOk'><pre id='SihOk'><center id='SihOk'></center></pre></bdo></b><th id='SihOk'></th></span></q></dt></tr></i><div id='SihOk'><tfoot id='SihOk'></tfoot><dl id='SihOk'><fieldset id='SihOk'></fieldset></dl></div>

                    • <tfoot id='SihOk'></tfoot>
                        <legend id='SihOk'><style id='SihOk'><dir id='SihOk'><q id='SihOk'></q></dir></style></legend>

                        <small id='SihOk'></small><noframes id='SihOk'>

                          <tbody id='SihOk'></tbody>