如何从多个JSON对象中选择一个并在python中导航其层次结构
How to select a single JSON object out of several and navigate its hierarchy in python
我有一个包含几个javascript元素的网页。我只想访问一个名为SOURCE.pdp.propertyJSON
的属性,并以python的方式访问其属性。
编辑(为了可读性)的HTML源代码版本如下;以下是我的python代码:
任何指示将非常感激!
<script type="text/javascript">
SOURCE = SOURCE || {};
SOURCE.pdp = SOURCE.pdp || {};
SOURCE.pdp.propertyJSON = {
"neighborhood": "Westwood",
"neighborhoodId": 7187,
"zipCode": "90024",
"city": "Los Angeles",
"county": "Los Angeles",
"countyFIPS": "06037",
"stateCode": "CA",
"stateName": "California",
"type": "CONDO",
"typeDisplay": "Condo",
"numBedrooms": "2",
"numBathrooms": 2,
"numFullBathrooms": 2,
"numBeds": 2,
"indexSource": "Assessor",
"isForeclosure": false,
"isOpaqueBAL": false,
"foreclosureStatus": "",
"isSrpFeatured": false,
"price": null,
"sqft": 1321,
"formattedBedAndBath": "2bd, 2 full ba",
"formattedSqft": "1,321 sqft"
}
pdp_location_data = {
"neighborhood": {
"locationId": "87308",
"name": "Westwood",
"locationType": "neighborhood",
"altId": "7187"
},
"state": {
"locationId": "5",
"name": "California",
"locationType": "state",
"altId": "CA"
},
"county": {
"locationId": "57",
"name": "Los Angeles County",
"locationType": "county",
"altId": "06037"
},
"city": {
"locationId": "22637",
"name": "Los Angeles",
"locationType": "city",
"altId": "4396"
},
"zipCode": {
"locationId": "76090",
"name": "90024",
"locationType": "zipCode",
"altName": "90024",
"altId": "90024"
}
};
SOURCE.pdp.isCountySupportsValuation = true;
SOURCE.pdp.isInHighDemandRegion = false;
var _SPANLONG = pdp_location_data.longitude;
var _SPANLAT = pdp_location_data.latitude;
var _CENLONG = pdp_location_data.longitude;
var _CENLAT = pdp_location_data.latitude;
</script>
小心丑陋的巨蟒!
from bs4 import BeautifulSoup as bsoup
import requests as rq
url = 'https://www.SOURCE.com'
source_code = rq.get(url).text
soupcon = bsoup(source_code,"html.parser")
souper = soupcon.find_all('script', {'type': 'text/javascript'})
for line in souper:
if format(line).find('SOURCE.pdp.propertyJSON') != -1:
parts = format(line).split(',')
for var in parts:
if var.find('zipCode') != -1:
zipCode = var.split(':')[1].strip('"')
elif var.find('numBathrooms') != -1:
numBathrooms = var.split(':')[1].strip('"')
正如你所看到的,我目前正在访问我想要的JS对象,通过查找类型为text/javascript的所有脚本元素,遍历它们以找到包含我想要的对象的脚本,然后通过JS分隔符','
拆分整个脚本,并通过搜索它们来识别JS对象的元素。
可以使用json.loads:
将数据加载为字典from bs4 import BeautifulSoup as bsoup
import re
from json import loads
source = """<script type="text/javascript"> SOURCE = SOURCE || {};
SOURCE.pdp = SOURCE.pdp || {};
SOURCE.pdp.propertyJSON = { "neighborhood": "Westwood", "neighborhoodId": 7187, "zipCode": "90024", "city": "Los Angeles", "county": "Los Angeles", "countyFIPS": "06037", "stateCode": "CA", "stateName": "California", "type": "CONDO", "typeDisplay": "Condo", "numBedrooms": "2", "numBathrooms": 2, "numFullBathrooms": 2, "numBeds": 2, "indexSource": "Assessor", "isForeclosure": false, "isOpaqueBAL": false, "foreclosureStatus": "", "isSrpFeatured": false, "price": null, "sqft": 1321, "formattedBedAndBath": "2bd, 2 full ba", "formattedSqft": "1,321 sqft" } pdp_location_data = { "neighborhood": { "locationId": "87308", "name": "Westwood", "locationType": "neighborhood", "altId": "7187" }, "state": { "locationId": "5", "name": "California", "locationType": "state", "altId": "CA" }, "county": { "locationId": "57", "name": "Los Angeles County", "locationType": "county", "altId": "06037" }, "city": { "locationId": "22637", "name": "Los Angeles", "locationType": "city", "altId": "4396" }, "zipCode": { "locationId": "76090", "name": "90024", "locationType": "zipCode", "altName": "90024", "altId": "90024" } };
SOURCE.pdp.isCountySupportsValuation = true;
SOURCE.pdp.isInHighDemandRegion = false;
var _SPANLONG = pdp_location_data.longitude;
var _SPANLAT = pdp_location_data.latitude;
var _CENLONG = pdp_location_data.longitude;
var _CENLAT = pdp_location_data.latitude; </script>"""
soup = bsoup(source,"html.parser")
json_re = re.compile("SOURCE'.pdp'.propertyJSON's+='s+('{.*'})'s+pdp_location_data")
scr = soup.find("script", text=re.compile("SOURCE.pdp.propertyJSON")).text
js_raw = json_re.search(scr).group(1)
json_dict = loads(js_raw)
这将给你:
{u'numBeds': 2, u'neighborhood': u'Westwood', u'stateName': u'California', u'numFullBathrooms': 2, u'indexSource': u'Assessor', u'countyFIPS': u'06037', u'city': u'Los Angeles', u'isSrpFeatured': False, u'type': u'CONDO', u'formattedSqft': u'1,321 sqft', u'isOpaqueBAL': False, u'price': None, u'zipCode': u'90024', u'numBedrooms': u'2', u'neighborhoodId': 7187, u'county': u'Los Angeles', u'formattedBedAndBath': u'2bd, 2 full ba', u'sqft': 1321, u'numBathrooms': 2, u'stateCode': u'CA', u'isForeclosure': False, u'typeDisplay': u'Condo', u'foreclosureStatus': u''}
如果你想要pdp_location_data
json只是应用完全相同的逻辑
相关文章:
- 从json对象聚集数据并创建层次结构
- Html5-使用SVG路径绘制的组织层次结构在左手边被剪裁
- 使用jquery为移动布局更改html层次结构
- 如何在javascript中使用2个一维数组创建层次结构树
- 从d3.js中的csv创建树层次结构
- D3:使用 nest 函数将带有父键的平面数据转换为层次结构
- JSON 层次结构,如何获取元素
- 展平多个嵌套层次结构数组-d3.js
- Kendo UI层次结构DataSource架构不工作
- 用于构建树节点层次结构的javascript库
- 如何在JavaScript中显示注释线程层次结构
- js初学者-如何获得网页中所选内容的html以及整个节点层次结构
- Jquery选择层次结构
- 无法通过jQuery查找来定位层次结构中的对象
- Html5画布,继承层次结构可能导致闪烁
- 需要将一个层次结构的JSON转换为一个带行的html表
- 传递一个对象's的层次结构作为它的参数'的父函数
- 在Angular2的父/子组件层次结构中,你应该在什么时候订阅一个可观察对象?
- 如何从多个JSON对象中选择一个并在python中导航其层次结构
- 有没有一个工具可以创建我的 javascript 代码层次结构的可视化表示